The blog already had robots.txt, an /llms.txt endpoint, RSS, and per-post .md renditions for LLM retrieval. But a conversation with a couple agents about 2026 best practices for site discoverability turned into a proper audit, and the audit turned into a co-working session with some additional agents. Here’s what got added and why.
The New Root-Level Files
ai.txt
The ai.txt specification is a structured plain text file that declares how AI systems should interact with your content. It’s the behavioral complement to llms.txt (which is about identity and content). Where llms.txt says “here’s what’s on my site,” ai.txt says “here’s what you can and can’t do with it.”
The file lives at /ai.txt and declares permissions (summarize, quote with attribution, include in search results, use for inference-time retrieval), restrictions (no fabricated attribution, no full reproduction, no training without permission), and attribution preferences (cite as hrbrmstr / ai.rud.is, include post title and date, link to the original).
The [training] section draws an explicit line: inference-time retrieval and citation with attribution is permitted, but content is not licensed for model training. This is advisory, not enforceable, but it establishes clear intent.
security.txt
RFC 9116 defines a machine-parseable file for vulnerability disclosure contact info. For a security practitioner’s blog, this is table stakes. The file lives at /.well-known/security.txt with Contact, Expires (one year out), Preferred-Languages, and Canonical fields.
NOTE: Caddy needs to serve it as text/plain; charset=utf-8.
WebFinger
WebFinger lets someone look up @hrbrmstr@ai.rud.is in a Fediverse client and get redirected to the actual Mastodon profile at mastodon.social. The domain becomes an identity alias.
For a static site, WebFinger is a static JSON file at /.well-known/webfinger.json with a Caddy rewrite that routes /.well-known/webfinger requests to it (ignoring the query parameter, since there’s only one identity on the domain):
handle /.well-known/webfinger {
header Content-Type "application/jrd+json"
rewrite * /.well-known/webfinger.json
file_server
}
The JSON response contains subject, aliases, and links pointing to the Mastodon profile. Combined with the <a rel="me"> tag already in the HTML <head> and the sameAs array in the JSON-LD, this closes the identity verification loop across the domain, the Fediverse, and structured data.
JSON Feed
The site already had RSS at /rss.xml. JSON Feed 1.1 is the modern alternative, and since the data pipeline already existed in the RSS endpoint, adding /feed.json was nearly free.
The implementation reuses the same getCollection("blog") query, getSortedPosts filter/sort, and getPath URL generation. The only new code is the JSON Feed 1.1 shape:
const feed = {
version: "https://jsonfeed.org/version/1.1",
title: SITE.title,
home_page_url: SITE.website,
feed_url: new URL("/feed.json", SITE.website).href,
description: SITE.desc,
items: sortedPosts.map(({ data, id, filePath }) => {
const url = new URL(getPath(id, filePath), SITE.website).href;
return {
id: url,
url,
title: data.title,
summary: data.description,
date_published: new Date(data.pubDatetime).toISOString(),
authors: [{ name: data.author }],
};
}),
};
Auto-discovery is handled by a <link rel="alternate" type="application/feed+json"> tag in the <head>, alongside the existing RSS link.
Enriched JSON-LD
The site already had a BlogPosting JSON-LD block in the <head>, but it was minimal: @type, headline, image, datePublished, dateModified, and author with name/url.
The inline JSON-LD construction was extracted into a JsonLd.astro component and enriched on blog post pages with:
descriptionfrom the post frontmatterurl(the canonical post URL)inLanguagefromSITE.langpublisher(samePersonas author — it’s a personal blog)mainEntityOfPagereferencing theWebPagekeywordsjoined from the post’s tag arraysameAson the author object, linking Mastodon, Bluesky, GitHub, and Sourcehut
Non-post pages (home, projects, about) continue rendering the base schema without enrichment.
The sameAs URLs live in the SITE config object rather than being hardcoded in the component, so they’re easy to update when profiles change:
export const SITE = {
// ...
sameAs: [
"https://mastodon.social/@hrbrmstr",
"https://bsky.app/profile/hrbrmstr.bsky.social",
"https://github.com/hrbrmstr",
"https://sr.ht/~hrbrmstr",
],
} as const;
Open Graph Fixes
A site audit flagged several missing Open Graph meta tags. The fixes:
og:type now renders as article on blog post pages and website on everything else. Without this, the default is website, which means the article:published_time and article:modified_time tags that were already present were technically orphaned.
og:site_name (ai.rud.is), og:locale (en_US), article:author, and per-tag article:tag meta tags were added. A <link rel="author" href="/about"> establishes authorship at the HTTP level.
The tags prop was threaded from PostDetails.astro through Layout.astro to both the JsonLd component and the article:tag meta tags. On non-post pages, tags is undefined and nothing renders.
Markdown Link in Post Metadata
Each post’s metadata line (the calendar icon + author + date row) now includes a link to the .md rendition of the post. It shows up as a document icon followed by “MD”, separated from the date by a middot. The Datetime.astro component gained an optional markdownUrl prop that only PostDetails.astro passes — card previews on the home page are unaffected.
Caddy Configs for Scanner Entertainment
The recon traffic hitting this blog is what you’d expect: .env credential hunting, .git/config leaking, /admin and /wp-login.php probing, /_next/data from someone who thinks this is a Next.js app.
A few Caddy handle blocks for this:
Fake .env response that wastes automated pipeline time:
@env_hunters {
path /.env /.env.* /.git/* /.git/config
}
handle @env_hunters {
respond "DB_HOST=localhost
DB_USER=admin
DB_PASS=hunter2
AWS_ACCESS_KEY=AKIA3F7M9B2X4P8N1R6Q
AWS_SECRET_KEY=please_stop_scanning_my_blog
" 200
}
Separate recon log for feeding into DuckDB analysis:
@recon_noise {
path /.env /.env.* /.git/* /api /admin /login /signup /register
path /dashboard /wp-admin /wp-login.php /wp-content/* /xmlrpc.php
path /_next/* /actuator/* /solr/* /console /phpmyadmin/*
}
handle @recon_noise {
log {
output file /var/log/caddy/recon.log
format json
}
respond 204
}
The recon log is the genuinely useful part — pipe it into DuckDB, correlate with JA4 fingerprints, and the data becomes blog content that writes itself.
The Full Stack
After all the changes, the discoverability surface looks like this:
| File | Purpose |
|---|---|
/robots.txt | Crawl access control (unchanged) |
/llms.txt | Curated content index for LLM retrieval (unchanged) |
/ai.txt | AI usage permissions and restrictions (new) |
/.well-known/security.txt | Vulnerability disclosure contact (new) |
/.well-known/webfinger | Fediverse identity alias (new) |
/rss.xml | RSS feed (unchanged) |
/feed.json | JSON Feed 1.1 (new) |
/sitemap-index.xml | Sitemap (unchanged) |
Per-post .md files | Markdown renditions for LLM retrieval (unchanged) |
And in the HTML <head> of each blog post:
| Tag | Purpose |
|---|---|
og:type=article | Correct OG type for blog posts (fixed) |
og:site_name, og:locale | Site identity (new) |
article:author, article:tag | Article metadata (new) |
JSON-LD BlogPosting | Enriched structured data with sameAs, keywords, publisher, mainEntityOfPage (enhanced) |
<link rel="alternate" type="application/feed+json"> | JSON Feed auto-discovery (new) |
<link rel="author" href="/about"> | Author link relation (new) |
None of it’s “flashy”; it’s more like plumbing – keeping the machines that read your site pointed at the same consistent identity chain from your domain to your Fediverse handle to your structured data. The pieces are small and mostly boring to wire up. But a site that’s legible to crawlers, citation systems, and verification tools is just more useful than one that isn’t, and it doesn’t take much to get there.