---
title: "Making ai.rud.is Legible To Machines"
description: "A rundown of the discoverability and metadata upgrades made to this blog: ai.txt, security.txt, WebFinger, JSON Feed, enriched JSON-LD, Open Graph fixes, and Caddy configs for messing with scanners."
pubDatetime: 2026-05-23T10:00:00Z
author: hrbrmstr
tags: ["astro", "seo", "json-ld", "webfinger", "caddy", "llms-txt", "json-feed", "security-txt", "ai-txt", "structured-data", "fediverse"]
---
> Original: [Making ai.rud.is Legible To Machines](https://ai.rud.is/posts/2026-05-23-making-airudis-legible-to-machines)

The blog already had `robots.txt`, an `/llms.txt` endpoint, RSS, and per-post `.md` renditions for LLM retrieval. But a conversation with a couple agents about 2026 best practices for site discoverability turned into a proper audit, and the audit turned into a co-working session with some additional agents. Here's what got added and why.

## The New Root-Level Files

### ai.txt

The [ai.txt specification](https://www.ai-visibility.org.uk/specifications/ai-txt/) is a structured plain text file that declares how AI systems should interact with your content. It's the behavioral complement to `llms.txt` (which is about identity and content). Where `llms.txt` says "here's what's on my site," `ai.txt` says "here's what you can and can't do with it."

The file lives at `/ai.txt` and declares permissions (summarize, quote with attribution, include in search results, use for inference-time retrieval), restrictions (no fabricated attribution, no full reproduction, no training without permission), and attribution preferences (cite as `hrbrmstr / ai.rud.is`, include post title and date, link to the original).

The `[training]` section draws an explicit line: inference-time retrieval and citation with attribution is permitted, but content is not licensed for model training. This is advisory, not enforceable, but it establishes clear intent.

### security.txt

[RFC 9116](https://www.rfc-editor.org/rfc/rfc9116.html) defines a machine-parseable file for vulnerability disclosure contact info. For a security practitioner's blog, this is table stakes. The file lives at `/.well-known/security.txt` with `Contact`, `Expires` (one year out), `Preferred-Languages`, and `Canonical` fields.

NOTE: Caddy needs to serve it as `text/plain; charset=utf-8`.

### WebFinger

WebFinger lets someone look up `@hrbrmstr@ai.rud.is` in a Fediverse client and get redirected to the actual Mastodon profile at `mastodon.social`. The domain becomes an identity alias.

For a static site, WebFinger is a static JSON file at `/.well-known/webfinger.json` with a Caddy rewrite that routes `/.well-known/webfinger` requests to it (ignoring the query parameter, since there's only one identity on the domain):

```caddy
handle /.well-known/webfinger {
  header Content-Type "application/jrd+json"
  rewrite * /.well-known/webfinger.json
  file_server
}
```

The JSON response contains `subject`, `aliases`, and `links` pointing to the Mastodon profile. Combined with the `<a rel="me">` tag already in the HTML `<head>` and the `sameAs` array in the JSON-LD, this closes the identity verification loop across the domain, the Fediverse, and structured data.

## JSON Feed

The site already had RSS at `/rss.xml`. JSON Feed 1.1 is the modern alternative, and since the data pipeline already existed in the RSS endpoint, adding `/feed.json` was nearly free.

The implementation reuses the same `getCollection("blog")` query, `getSortedPosts` filter/sort, and `getPath` URL generation. The only new code is the JSON Feed 1.1 shape:

```typescript
const feed = {
  version: "https://jsonfeed.org/version/1.1",
  title: SITE.title,
  home_page_url: SITE.website,
  feed_url: new URL("/feed.json", SITE.website).href,
  description: SITE.desc,
  items: sortedPosts.map(({ data, id, filePath }) => {
    const url = new URL(getPath(id, filePath), SITE.website).href;
    return {
      id: url,
      url,
      title: data.title,
      summary: data.description,
      date_published: new Date(data.pubDatetime).toISOString(),
      authors: [{ name: data.author }],
    };
  }),
};
```

Auto-discovery is handled by a `<link rel="alternate" type="application/feed+json">` tag in the `<head>`, alongside the existing RSS link.

## Enriched JSON-LD

The site already had a `BlogPosting` JSON-LD block in the `<head>`, but it was minimal: `@type`, `headline`, `image`, `datePublished`, `dateModified`, and `author` with `name`/`url`.

The inline JSON-LD construction was extracted into a `JsonLd.astro` component and enriched on blog post pages with:

- `description` from the post frontmatter
- `url` (the canonical post URL)
- `inLanguage` from `SITE.lang`
- `publisher` (same `Person` as author -- it's a personal blog)
- `mainEntityOfPage` referencing the `WebPage`
- `keywords` joined from the post's tag array
- `sameAs` on the author object, linking Mastodon, Bluesky, GitHub, and Sourcehut

Non-post pages (home, projects, about) continue rendering the base schema without enrichment.

The `sameAs` URLs live in the `SITE` config object rather than being hardcoded in the component, so they're easy to update when profiles change:

```typescript
export const SITE = {
  // ...
  sameAs: [
    "https://mastodon.social/@hrbrmstr",
    "https://bsky.app/profile/hrbrmstr.bsky.social",
    "https://github.com/hrbrmstr",
    "https://sr.ht/~hrbrmstr",
  ],
} as const;
```

## Open Graph Fixes

A site audit flagged several missing Open Graph meta tags. The fixes:

`og:type` now renders as `article` on blog post pages and `website` on everything else. Without this, the default is `website`, which means the `article:published_time` and `article:modified_time` tags that were already present were technically orphaned.

`og:site_name` (`ai.rud.is`), `og:locale` (`en_US`), `article:author`, and per-tag `article:tag` meta tags were added. A `<link rel="author" href="/about">` establishes authorship at the HTTP level.

The `tags` prop was threaded from `PostDetails.astro` through `Layout.astro` to both the `JsonLd` component and the `article:tag` meta tags. On non-post pages, `tags` is `undefined` and nothing renders.

## Markdown Link in Post Metadata

Each post's metadata line (the calendar icon + author + date row) now includes a link to the `.md` rendition of the post. It shows up as a document icon followed by "MD", separated from the date by a middot. The `Datetime.astro` component gained an optional `markdownUrl` prop that only `PostDetails.astro` passes -- card previews on the home page are unaffected.

## Caddy Configs for Scanner Entertainment

The recon traffic hitting this blog is what you'd expect: `.env` credential hunting, `.git/config` leaking, `/admin` and `/wp-login.php` probing, `/_next/data` from someone who thinks this is a Next.js app.

A few Caddy handle blocks for this:

**Fake `.env` response** that wastes automated pipeline time:

```caddy
@env_hunters {
  path /.env /.env.* /.git/* /.git/config
}
handle @env_hunters {
  respond "DB_HOST=localhost
DB_USER=admin
DB_PASS=hunter2
AWS_ACCESS_KEY=AKIA3F7M9B2X4P8N1R6Q
AWS_SECRET_KEY=please_stop_scanning_my_blog
" 200
}
```

**Separate recon log** for feeding into DuckDB analysis:

```caddy
@recon_noise {
  path /.env /.env.* /.git/* /api /admin /login /signup /register
  path /dashboard /wp-admin /wp-login.php /wp-content/* /xmlrpc.php
  path /_next/* /actuator/* /solr/* /console /phpmyadmin/*
}
handle @recon_noise {
  log {
    output file /var/log/caddy/recon.log
    format json
  }
  respond 204
}
```

The recon log is the genuinely useful part -- pipe it into DuckDB, correlate with JA4 fingerprints, and the data becomes blog content that writes itself.

## The Full Stack

After all the changes, the discoverability surface looks like this:

| File | Purpose |
|------|---------|
| `/robots.txt` | Crawl access control (unchanged) |
| `/llms.txt` | Curated content index for LLM retrieval (unchanged) |
| `/ai.txt` | AI usage permissions and restrictions (new) |
| `/.well-known/security.txt` | Vulnerability disclosure contact (new) |
| `/.well-known/webfinger` | Fediverse identity alias (new) |
| `/rss.xml` | RSS feed (unchanged) |
| `/feed.json` | JSON Feed 1.1 (new) |
| `/sitemap-index.xml` | Sitemap (unchanged) |
| Per-post `.md` files | Markdown renditions for LLM retrieval (unchanged) |

And in the HTML `<head>` of each blog post:

| Tag | Purpose |
|-----|---------|
| `og:type=article` | Correct OG type for blog posts (fixed) |
| `og:site_name`, `og:locale` | Site identity (new) |
| `article:author`, `article:tag` | Article metadata (new) |
| JSON-LD `BlogPosting` | Enriched structured data with `sameAs`, `keywords`, `publisher`, `mainEntityOfPage` (enhanced) |
| `<link rel="alternate" type="application/feed+json">` | JSON Feed auto-discovery (new) |
| `<link rel="author" href="/about">` | Author link relation (new) |

None of it's "flashy"; it's more like plumbing – keeping the machines that read your site pointed at the same consistent identity chain from your domain to your Fediverse handle to your structured data. The pieces are small and mostly boring to wire up. But a site that's legible to crawlers, citation systems, and verification tools is just more useful than one that isn't, and it doesn't take much to get there.

