Last updated: 2026-06-07T23:51:51Z | Log entries analyzed: 42321 | Model: deepseek-v4-flash:cloud | Enrichment: Censys
Site Observatory Report — ai.rud.is
Report date: 2026-06-07 (data for the previous day)
Observation period: 2026-03-10 09:10:30 UTC – 2026-06-07 23:46:22 UTC
Total requests: 42,321 | Unique IPs: 9,425
What Changed
This is an incremental run. The report day is 2026-06-07.
New scanner IPs:
205.210.31.36— 1 request, 0 404s. User‑agent:Hello from Palo Alto Networks, find out more about our scans in https://docs-cortex.paloaltonetworks.com/r/1/Cortex-Xpanse/Scanning-activity. First seen2026-06-07 13:24:13.613+00. A polite, well‑documented scan.
New recon URIs: None.
Volume trends (today vs. trailing 7‑day average):
fediverse: 791 vs 788.4 (ratio 1.0) — steady.visitor: 139 vs 935.3 (ratio 0.15) — sharp drop.ai_crawler: 54 vs 92.6 (ratio 0.58) — below average.scanner: 2 vs 149.9 (ratio 0.01) — nearly absent.owner: 3 vs 28.7 (ratio 0.1) — low.- Other classes (rss_reader, other_crawler, search_crawler) all near baseline (0.83–0.99).
Canary hits: No new triggers today. The existing decoy paths (.env, .git/config) continue to return 200 to scanners — 39 and 22 hits respectively over the full period.
Classification gaps: No new bot‑like UAs appeared today. The usual suspects remain: TLM-Audit-Scanner/1.0 (4,564 requests), AIWebIndex/2.0 (329), ChatGPT-User/1.0 (281), Seamus the Search Engine (225), Barkrowler (207), and others. These need classifier rules.
Traffic Summary
On 2026-06-07 the site served 1,102 requests from 864 unique IPs. Over the full observation window, the signal‑to‑noise ratio is healthy: scanner traffic accounts for only 8.4% of all requests (3,559 of 42,321). The largest traffic classes by volume are visitors (35.6%), fediverse link‑preview fetches (25.8%), and AI crawlers (12.9%).
Resource Consumption
Bandwidth figures reflect bytes actually transferred (conditional/cached responses show 0 bytes, so totals undercount logical content size).
| Traffic Class | Bytes Transferred | % of Total | Requests |
|---|---|---|---|
| fediverse | 264,598,257 | 41.6% | 10,903 |
| visitor | 175,954,305 | 27.7% | 15,077 |
| ai_crawler | 90,845,995 | 14.3% | 5,464 |
| search_crawler | 36,919,184 | 5.8% | 1,829 |
| other_crawler | 33,704,695 | 5.3% | 2,090 |
| owner | 30,239,720 | 4.8% | 1,123 |
| scanner | 2,553,758 | 0.4% | 3,559 |
| rss_reader | 1,119,037 | 0.2% | 2,276 |
Fediverse link‑preview fetches consume the most bandwidth despite being lightweight single‑page requests — a consequence of many instances polling / repeatedly.
Temporal Patterns
Hourly cadence:
- Visitors peak between 08:00–09:00 UTC (1,595 requests) and again at 18:00–19:00 UTC (1,665 and 1,525). This aligns with European/American daytime.
- Scanners are most active at 05:00 UTC (502 requests) and 12:00 UTC (935) — likely automated campaigns running on cron.
- Fediverse activity is concentrated between 10:00–17:00 UTC, peaking at 13:00 UTC (1,499). This is when Mastodon/Misskey instances fetch link previews after a post is shared.
- AI crawlers show a broad plateau from 09:00–21:00 UTC, with a slight dip overnight.
Day‑of‑week patterns:
- Tuesday stands out with 6,124 visitor requests — almost entirely due to the June 2 anomaly (4,937 visitor requests that day, likely a viral post or referral spike).
- Sundays and Thursdays see elevated AI crawler activity (889 each).
- Scanner activity is highest on Tuesday (843) and Wednesday (940), lowest on Thursday (151) and Sunday (244).
Content & Visitors
Top pages (all time):
/— 1,168 hits, 777 visitors/posts/2026-04-04-ollama-usage/— 251 hits/posts/2026-04-21-starlog-stars-are-better-off-without-us/— 196 hits/posts/observatory/— 201 hits (this report)/about/— 128 hits
Referrers: Google dominates (232 from https://www.google.com/, 31 from google.com). rud.is (the parent domain) sends 37 referrals. A handful from inoreader.com (4) and m.baidu.com (4).
Protocol adoption:
- HTTP/2 is the most common negotiated protocol (34.1% of requests), followed by HTTP/1.1 (10.3%).
- HTTP/3 (h3) accounts for 2.9% of requests (1,215). Among visitors, 335 requests used HTTP/3; owner traffic uses it heavily (837 of 1,123). Scanners rarely use it (38).
- 46% of requests have an empty negotiated protocol field — these are likely non‑TLS or early‑stage connections.
Browser families (visitor class):
- Chrome: 5,714 requests (2,317 IPs)
- Safari: 1,144 (525 IPs)
- Firefox: 578 (242 IPs)
- Edge: 316 (94 IPs)
- Opera: 88 (86 IPs)
- “Other” (includes bots, curl, etc.): 7,237 (771 IPs)
RSS subscribers: 3 IPs account for 2,234 rss_reader requests. A small but loyal audience.
Agent‑Artifact Access Patterns
Since 2026-05-13 the site serves llms.txt and .md versions of every post for LLM agents. Over the full period:
llms.txt: 29 requests, 25 unique IPs, 77,643 bytes transferred.- Post
.mdfiles: 379 requests, 188 unique IPs, 785,454 bytes transferred.
Top consumers of agent artifacts (by user‑agent):
TLM-Audit-Scanner/1.0— 68 requests, 9 artifacts (this is a scanner, not an AI crawler)Bytespider— 30 requests, 2 artifactsAmazonbot— 24 requests, 21 artifactsGooglebot— 21 requests, 11 artifactsClaudeBot— 18 requests, 18 artifactsMJ12bot— 18 requests, 13 artifactsBaiduspider— 16 requests, 7 artifactsAIWebIndex/2.0— 15 requests, 15 artifactsApplebot— 14 requests, 6 artifactsGPTBot— 10 requests, 9 artifacts
Most‑accessed post .md files:
/posts/2026-04-04-ollama-usage.md— 31 requests (24 from AI crawlers)/posts/2026-05-22-fascine-siege-works.md— 28 requests (10 AI, 9 search)/posts/2026-04-04-outline-bookmark-ext.md— 27 requests (18 AI)/posts/2026-05-23-starlog-and-the-case-of-the-missing-feed.md— 21 requests (8 AI, 5 search)/posts/2026-05-23-making-airudis-legible-to-machines.md— 19 requests (8 AI, 3 search)
Comparison to HTML views: The most popular .md file (ollama-usage.md, 31 requests) is an order of magnitude less than its HTML counterpart (/posts/2026-04-04-ollama-usage/, 251 hits). Agent‑artifact traffic is still a small fraction of total content consumption, but it is being used by the intended audience — AI crawlers account for the majority of .md requests. The notable exception is TLM-Audit-Scanner, which is a scanner masquerading as a visitor; it should be reclassified.
Bandwidth cost: 785 KB for all post .md files and 78 KB for llms.txt — negligible compared to the 636 MB total served.
AI Crawler Activity
| Crawler | Requests | IPs | Unique Pages |
|---|---|---|---|
| Bytespider (ByteDance) | 1,663 | 698 | 127 |
| ClaudeBot (Anthropic) | 1,121 | 44 | 171 |
| Amazonbot (Amazon) | 716 | 341 | 196 |
| OAI-SearchBot (OpenAI) | 538 | 76 | 138 |
| GPTBot (OpenAI) | 451 | 20 | 210 |
| Applebot (Apple) | 312 | 214 | 135 |
| Meta Crawler (Meta) | 224 | 151 | 144 |
| PetalBot (Huawei) | 166 | 20 | 106 |
| CCBot (Common Crawl) | 131 | 6 | 104 |
| PerplexityBot | 113 | 11 | 19 |
| Other AI | 23 | 12 | 12 |
| Timpibot | 4 | 4 | 1 |
| YouBot (You.com) | 2 | 1 | 2 |
Bytespider is the most aggressive by request count and IP spread (698 IPs). ClaudeBot is more concentrated (44 IPs) but hits the most unique pages (171). GPTBot covers 210 unique pages from only 20 IPs — thorough indexing. For a small personal blog, this level of AI crawler attention is notable but not burdensome; the total bandwidth consumed by AI crawlers is 90 MB (14.3% of all traffic).
Search Engine Crawlers
| Crawler | Requests | IPs | Unique Pages | Last Seen (UTC) |
|---|---|---|---|---|
| Bingbot | 615 | 214 | 130 | 2026-06-07 23:03:22.763 |
| Baiduspider | 549 | 261 | 181 | 2026-06-07 21:39:22.556 |
| Googlebot | 531 | 47 | 184 | 2026-06-07 10:04:13.435 |
| Yandex | 68 | 57 | 6 | 2026-06-07 15:30:29.971 |
| DuckDuckBot | 66 | 3 | 4 | 2026-06-07 14:12:18.044 |
All major engines are actively indexing. Googlebot is efficient (47 IPs, 184 pages). Baiduspider uses many IPs (261) and covers 181 pages. Bingbot is also thorough. DuckDuckBot and Yandex are light.
Fediverse Activity
Fediverse link‑preview fetches are a steady background hum. Over the full period, 10,903 requests from 2,645 unique IPs. The most active instances (by request count):
Misskey/2025.4.7 (https://calckey.world/)— 19 requestsMisskey/2025.4.7 (https://social.louis-vallat.dev/)— 18Misskey/2025.4.6 (https://transfem.social/)— 18Misskey/2025.4.7 (https://evy.pet/)— 18 (2 IPs)Misskey/2025.4.7 (https://federation.network/)— 17Mastodon/4.6.0-nightly.2026-05-21+chuckya (https://rivals.space/)— 15Mastodon/4.5.6 (https://veganism.social/)— 14Mastodon/4.6.0-alpha.8+glitch (https://is-a.cat/)— 14 (2 IPs)Mastodon/4.5.3 (https://publicsquare.global/)— 14Mastodon/4.5.7 (https://patrickflynn.me/)— 14
These are distinct instances, not a single recurring reader. The diversity suggests the site is being shared across the fediverse regularly.
Scanner & Recon Activity
Top scanners (all time):
185.177.72.38— 662 requests, 661 404s, UAcurl/8.7.1. A brute‑force recon sweep.172.94.9.253— 530 requests, 471 404s, UAMozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0. Persistent over two months.45.148.10.95— 393 requests, 393 404s, UATLM-Audit-Scanner/1.0.195.178.110.199— 315 requests, 315 404s, UA Chrome 131.35.87.118.147and16.171.255.183— each 211 requests, 207 404s, UA Chrome 131 (likely same campaign).16.52.148.103and54.90.227.81— each 178 requests, 174 404s, UA Chrome 131 (another campaign).104.244.74.39— 60 requests, 60 404s, UAPython/3.13 aiohttp/3.11.18.165.22.34.189and134.209.25.199—l9scan/2.0(LeakIX).
Recon URIs (most hit):
/.env— 67 hits from 41 IPs/.git/config— 49 hits from 45 IPs/backend/.env— 34 hits/api— 34 hits from 3 IPs/.env.local— 33 hits/en— 32 hits from 1 IP (likely a path traversal probe)/signup,/login,/admin,/register,/dashboard,/product,/_next/data— all common CMS/API probes.
Enrichment: No enrichment data is available for the scanner IPs listed above. The ENRICHMENT block contains only benign crawler IPs (Amazonbot, Applebot, Bytespider, Facebook). For scanner IPs, we lack Censys context.
Canary hits: The site returns 200 for .env (39 hits, 25 IPs) and .git/config (22 hits, 17 IPs). These are decoy responses — the files do not exist, but the server deliberately serves a small payload to trap scanners. The canaries are working.
IP Persistence
The following IPs appear across multiple days (approximation — IPv6 privacy extensions and NAT mean IP ≠ individual):
47.160.48.231— 12 days, 79 requests (likely a real reader)206.223.233.171— 12 days, 21 requests73.61.103.2— 10 days, 46 requests66.249.74.70— 10 days, 42 requests (Googlebot)66.249.74.69— 10 days, 39 requests (Googlebot)217.113.194.125— 10 days, 20 requests (Barkrowler? See classification gaps)2a01:4f9:3a:40c9::2— 10 days, 10 requests (Hetzner IPv6, likely a fediverse instance)172.104.210.82— 8 days, 101 requests (likely a reader)103.109.101.75— 8 days, 30 requests92.247.181.45— 7 days, 54 requests178.105.195.233— 6 days, 76 requests46.225.147.220— 6 days, 62 requests
The 217.113.194.x range (multiple IPs) appears frequently — these are likely Barkrowler (a crawler from babbar.tech) which is in the classification gaps. The 66.249.74.x range is Googlebot.
Observations
-
Visitor traffic cratered on June 7 — 139 requests vs. a 7‑day average of 935. This is not a weekend effect (previous Sundays had 221, 93, 159, 229, 102, 241). Likely a referral source dried up or a post lost its viral momentum. Worth investigating the referrer logs for the prior days.
-
The TLM-Audit-Scanner is the top consumer of
.mdfiles — 68 requests, more than any AI crawler. This scanner is already in the classification gaps and should be reclassified fromvisitortoscanner. It’s ironic that a security scanner is the heaviest user of the “designed for AI agents” artifacts. -
Canary decoys are working —
.envand.git/configreturn 200 to scanners, and the hits are ongoing. The 264 bytes served for.git/configis a tiny cost for early‑warning detection. No new canary triggers appeared today, but the existing ones continue to attract attention.
Generated by — Caddy logs → DuckDB → Censys → Ollama → Astro