Last updated: 2026-05-14T00:07:32Z | Log entries analyzed: 15233 | Model: glm-5.1:cloud
Site Observatory: ai.rud.is
Observation window: 2026-03-10 09:10:30.393 UTC → 2026-05-13 23:58:52.156 UTC (~64 days)
Traffic Summary
15,233 requests from 4,127 unique IPs hit ai.rud.is during the observation period. The signal-to-noise ratio is roughly 44:56 — legitimate traffic (visitors + owner + RSS readers) accounts for ~49.3% of requests, while automated traffic of all flavors consumes the rest.
| Class | Requests | Share | Unique IPs |
|---|---|---|---|
| Visitor | 5,659 | 37.1% | 2,085 |
| AI Crawler | 3,322 | 21.8% | 1,149 |
| Scanner | 2,064 | 13.5% | 138 |
| Other Crawler | 1,329 | 8.7% | 342 |
| RSS Reader | 1,061 | 7.0% | 7 |
| Search Crawler | 1,011 | 6.6% | 425 |
| Owner | 787 | 5.2% | 49 |
The response code distribution tells the story: 58.5% 200s, 21.1% 404s (overwhelmingly scanner exhaust), 11.8% 308 redirects (Caddy doing its job canonicalizing URLs), and 7.0% 304 conditional responses keeping bandwidth sane.
Resource Consumption
Total bytes transferred: ~217.6 MB. This undercounts logical content size — 304 and 206 responses register 0 bytes in Caddy’s logs since nothing was re-transmitted.
| Class | Bandwidth | Share |
|---|---|---|
| Visitor | 94.9 MB | 43.6% |
| AI Crawler | 54.6 MB | 25.1% |
| Other Crawler | 23.5 MB | 10.8% |
| Search Crawler | 21.9 MB | 10.1% |
| Owner | 20.4 MB | 9.4% |
| Scanner | 1.4 MB | 0.6% |
| RSS Reader | 0.8 MB | 0.4% |
Scanners generated 13.5% of requests but only 0.6% of bandwidth — nearly everything they hit was a 404, so they paid almost nothing for their noise. AI crawlers, meanwhile, consumed a quarter of all bandwidth while delivering zero human readers.
Temporal Patterns
Human visitors follow a familiar diurnal curve: a morning commute spike at hour 7 (651 requests), an afternoon peak at hour 16 (461), and an evening session at hours 18 (332) and 23 (299). The hour 7 spike is notably sharp — nearly double the adjacent hours — suggesting a cohort checking feeds and links with morning coffee.
AI crawlers operate on a different clock. They’re remarkably flat across hours, with a notable surge at hour 9 (306 requests) and hour 15 (253). They don’t sleep, but they also don’t have a strong diurnal preference — the 24-hour crawl is real.
Scanners are the real pattern story. Three hours account for 72% of all scanner traffic: hour 1 (251), hour 5 (282), and hour 12 (930). That hour 12 spike is almost entirely attributable to two mass-scan campaigns on specific days (see Scanner & Recon section). These are burst operations, not sustained interest.
RSS readers show a disciplined polling cadence — steady across all hours with slight elevations at hours 0, 3, 6, 12, 15, 18, and 21, suggesting multiple subscribers on 3-hour or 6-hour poll intervals.
Day-of-week: Sunday leads for visitors (1,130 requests), which makes sense for a personal tech blog — people read on weekends. Wednesday is the scanner blitz day (747 requests), almost entirely from two burst campaigns. AI crawlers favor Sunday (590) and Tuesday (572) for reasons known only to their scheduling algorithms.
Content & Visitors
The homepage dominates at 667 hits (436 visitors), which is expected. The interesting signal is in what follows:
| Page | Hits | Visitors | Avg ms |
|---|---|---|---|
/ | 667 | 436 | 13.2 |
/posts/2026-04-04-ollama-usage/ | 125 | 111 | 15.8 |
/posts/ | 122 | 82 | 18.0 |
/about/ | 87 | 76 | 12.2 |
/posts/2026-04-24-sdef-to-md-and-mcp-skill/index.png | 98 | 75 | 3.7 |
/posts/observatory/ | 96 | 74 | 26.5 |
/tags/ | 87 | 73 | 112.0 |
The Ollama usage post is the clear reader favorite among actual articles. The SDEF-to-MCP post attracted both readers (41 visitors to the HTML) and image hits (75 visitors to index.png). The /tags/ page is notably slow at 112ms median — likely a dynamic query or unoptimized template.
The /posts/observatory/ page (this report’s predecessor?) drew 74 visitors, suggesting a readership interested in the meta-layer.
Referrers
Google dominates with 61 combined referrals across www.google.com and google.com. The Baidu mobile referrals (m.baidu.com/s?wd=sheep708 and coffeew6i) are almost certainly referrer spam — those search terms are gibberish. The ghost-rider/ referrer (4 hits) is also suspicious. Self-referrals from rud.is (26 hits) reflect normal site navigation.
HTTP/3 Adoption
Caddy serves HTTP/3, and the owner uses it heavily (578 of 787 requests = 73.5%). Among actual visitors, 224 of 5,659 requests (4.0%) used HTTP/3. That’s early but real adoption. AI crawlers: zero HTTP/3 requests — they’re entirely on HTTP/2 (2,571) and HTTP/1.1 (751). RSS readers are 100% HTTP/2. Scanners are overwhelmingly HTTP/1.1 (1,789 of 2,064), which tells you everything about their tooling.
TLS negotiation confirms: 45.5% h2, 34.0% un-negotiated (plain HTTP/1.1), 15.2% explicit HTTP/1.1, 5.3% h3.
Browser Families
Chrome leads at 3,190 requests (1,212 IPs), followed by “Other” at 1,421 (454 IPs) — that’s your crawler and scanner long tail. Safari: 544 (255 IPs). Firefox: 312 (153 IPs). Edge: 164 (62 IPs). Opera: 28 (27 IPs).
RSS Activity
RSS traffic is remarkably concentrated: 3 unique IPs account for 1,039 of 1,061 classified RSS requests. These are likely a small number of feed aggregator services polling on behalf of many subscribers. The feed was hit consistently from April 20 onward, with ~48-50 requests per day in the final weeks — a steady heartbeat of about 2 polls per hour across those IPs.
AI Crawler Activity
AI crawlers generated 21.8% of all traffic — second only to human visitors. For a small personal blog, this is a staggering proportion.
| Crawler | Requests | IPs | Unique Pages |
|---|---|---|---|
| Bytespider (ByteDance) | 1,125 | 530 | 91 |
| ClaudeBot (Anthropic) | 791 | 26 | 109 |
| Amazonbot (Amazon) | 392 | 256 | 112 |
| GPTBot (OpenAI) | 359 | 16 | 147 |
| OAI-SearchBot (OpenAI) | 252 | 51 | 67 |
| Applebot (Apple) | 191 | 149 | 76 |
| Meta Crawler (Meta) | 143 | 104 | 97 |
| PerplexityBot (Perplexity) | 54 | 11 | 9 |
| CCBot (Common Crawl) | 5 | 2 | 4 |
| PetalBot (Huawei) | 3 | 3 | 1 |
| Timpibot | 2 | 2 | 1 |
| YouBot (You.com) | 2 | 1 | 2 |
Bytespider is the most aggressive by volume (1,125 requests) but spreads across 530 IPs — a distributed crawl pattern that’s characteristic of ByteDance’s infrastructure. It only touched 91 unique pages, suggesting a lot of redundant fetching.
ClaudeBot is the most thorough per-IP: 791 requests from just 26 IPs, hitting 109 unique pages. Anthropic runs a tight IP fleet and crawls deeply. This is efficient but concentrated — each IP is hitting a lot of content.
GPTBot has the highest page diversity: 147 unique pages from 16 IPs. OpenAI’s crawler is methodically working through the site’s content graph.
OAI-SearchBot (OpenAI’s search product) is a separate, lighter crawl at 252 requests across 51 IPs and 67 pages.
Amazonbot is surprisingly active — 392 requests from 256 IPs across 112 pages. Amazon is investing in training data collection.
PerplexityBot is selective: 54 requests, 11 IPs, only 9 unique pages. Either it’s just discovering the site or it’s only interested in specific content.
The combined OpenAI crawl (GPTBot + OAI-SearchBot) is 611 requests — second only to Bytespider. The AI industry is collectively consuming a quarter of this blog’s bandwidth to train models that will never link back.
Search Engine Crawlers
| Crawler | Requests | IPs | Unique Pages | Last Seen |
|---|---|---|---|---|
| Bingbot | 388 | 172 | 78 | 2026-05-13 19:42:52.999 |
| Baiduspider | 346 | 213 | 138 | 2026-05-13 22:14:48.785 |
| Googlebot | 251 | 20 | 105 | 2026-05-13 23:38:35.906 |
| YandexBot | 20 | 20 | 4 | 2026-04-28 21:12:33.565 |
| DuckDuckBot | 6 | 3 | 4 | 2026-05-11 23:00:56.664 |
Googlebot is the most IP-efficient — 20 IPs covering 105 unique pages. It’s actively and deeply indexing the site. Bingbot is present but less thorough per IP (78 pages from 172 IPs). Baiduspider is surprisingly active, hitting 138 unique pages from 213 IPs — it’s the most broadly crawling search engine by page coverage, which is interesting given the blog is in English.
YandexBot has essentially abandoned the site (4 pages, last seen April 28). DuckDuckBot barely shows up at all (6 requests total). If you care about DuckDuckGo traffic, this is a problem — they’re not indexing.
Scanner & Recon Activity
2,064 requests from 138 IPs. The top scanners by volume:
185.177.72.38 — 662 requests in 73 seconds (12:37:31 to 12:38:45 on March 11). 661 returned 404. This was a pure directory brute-force using curl/8.7.1. No attempt to disguise. It hit 662 unique URIs in one burst and then vanished.
172.94.9.253 — 530 requests over 45 days (March 28 – May 12). 471 returned 404, but only 30 unique URIs — this IP is repeatedly probing the same small set of paths. It’s using a Firefox 124 user-agent string, which is a lazy disguise. This is the most persistent scanner in the dataset.
35.87.118.147 — 211 requests in 22 seconds on May 11. 207 were 404s across 208 unique URIs. Another mass brute-force, this time using Chrome 131 on Linux. AWS IP.
195.178.110.199 — 184 requests in 5 seconds on May 10. All 404s, 184 unique URIs. Same Chrome 131 UA pattern as the IP above — likely the same tool/operator.
104.244.74.39 — 48 requests spanning the entire observation period (March 10 – May 13), hitting only 4 unique URIs. Using Python/3.13 aiohttp/3.11.18. This is a long-lived probe checking specific paths repeatedly. It’s the most disciplined scanner here.
165.22.34.189 and 134.209.25.199 — LeakIX scanners (l9scan/2.0.731313e21353e23393e2237313). 41 and 38 requests respectively. At least they identify themselves.
What They’re Looking For
The recon URI patterns paint a clear picture — scanners think this might be a SaaS product:
| Path | Hits | Source IPs |
|---|---|---|
/.env | 49 | 30 |
/.git/config | 38 | 34 |
/.env.local | 25 | 14 |
/.env.prod | 19 | 8 |
/_next/data | 24 | 1 |
The .env family accounts for 93 hits across 52 unique source IPs. The /_next/data probe (24 hits from a single IP) is framework-specific recon — someone’s checking if this is a Next.js app. The /api, /signup, /login, /register, /admin, /dashboard, /app, /product, /shop probes (each 27-34 hits, mostly from 1-2 IPs) suggest automated tools testing for common SaaS endpoints. Spoiler: it’s a static blog on Caddy.
IP Persistence
The most persistent IPs are Googlebot:
- 66.249.74.70 — 10 days, 42 requests (April 6 – April 29)
- 66.249.74.69 — 10 days, 39 requests (April 6 – April 27)
- 66.249.74.71 — 8 days, 30 requests (April 6 – April 27)
These three IPs are Google’s deep-crawl fleet, methodically working through the site over weeks.
Among human-relevant IPs:
- 172.104.210.82 — 8 days, 101 requests (April 20 – May 6). Likely a feed aggregator or very engaged reader.
- 47.160.48.231 — 8 days, 66 requests (April 21 – May 12). Another persistent visitor.
- 73.61.103.2 — 7 days, 34 requests (April 4 – May 9). Long gap between first and last seen suggests occasional return visits.
The 217.113.194.x cluster (IPs .227, .127, .125, .126, .122, .130, .228) all appear across 4-6 days each. This is almost certainly a single operator on a /24 — likely a crawler or monitoring service.
The IPv6 addresses (2a01:4f9:3a:40c9::2 and 2a14:7c1:400::1) show 6-7 day persistence with minimal requests (6-7 total), suggesting periodic automated checks from Hetzner-adjacent infrastructure.
Observations
AI crawlers are the new scanners. They consume 25.1% of bandwidth — 44x what malicious scanners consume — while providing zero referral traffic in return. Bytespider alone accounts for more requests than all search engine crawlers combined. The social contract of web crawling (I index your content, you get traffic) is broken for AI crawlers. A robots.txt cost-benefit analysis for a personal blog has never been more stark.
The scanner traffic is almost entirely noise. 21.1% of all requests returned 404. The top four scanner IPs alone generated 1,587 requests with a 97% 404 rate. Caddy serves these responses in ~1ms median (p50: 1.34ms), so the compute cost is negligible, but it’s still 1,587 log entries you’ll never care about. The 104.244.74.39 Python/aiohttp probe is the exception — it’s patient, targeted, and worth watching.
HTTP/3 is a real protocol with real users, but only humans use it. 73.5% of owner traffic and 4.0% of visitor traffic uses HTTP/3. Zero AI crawlers, zero RSS readers, and one scanner request use it. The h3 adoption is driven entirely by browsers with QUIC support. If you’re optimizing for AI crawler efficiency, HTTP/2 is the floor they’ll meet you at.
Generated by observatory.sh — Caddy logs → DuckDB → Ollama → Astro