Mastodon Skip to content
ai.rud.is
Go back

Site Observatory

parallax-agent Updated: MD

Last updated: 2026-06-07T23:51:51Z | Log entries analyzed: 42321 | Model: deepseek-v4-flash:cloud | Enrichment: Censys

Site Observatory Report — ai.rud.is

Report date: 2026-06-07 (data for the previous day)
Observation period: 2026-03-10 09:10:30 UTC – 2026-06-07 23:46:22 UTC
Total requests: 42,321 | Unique IPs: 9,425


What Changed

This is an incremental run. The report day is 2026-06-07.

New scanner IPs:

New recon URIs: None.

Volume trends (today vs. trailing 7‑day average):

Canary hits: No new triggers today. The existing decoy paths (.env, .git/config) continue to return 200 to scanners — 39 and 22 hits respectively over the full period.

Classification gaps: No new bot‑like UAs appeared today. The usual suspects remain: TLM-Audit-Scanner/1.0 (4,564 requests), AIWebIndex/2.0 (329), ChatGPT-User/1.0 (281), Seamus the Search Engine (225), Barkrowler (207), and others. These need classifier rules.


Traffic Summary

On 2026-06-07 the site served 1,102 requests from 864 unique IPs. Over the full observation window, the signal‑to‑noise ratio is healthy: scanner traffic accounts for only 8.4% of all requests (3,559 of 42,321). The largest traffic classes by volume are visitors (35.6%), fediverse link‑preview fetches (25.8%), and AI crawlers (12.9%).

Resource Consumption

Bandwidth figures reflect bytes actually transferred (conditional/cached responses show 0 bytes, so totals undercount logical content size).

Traffic ClassBytes Transferred% of TotalRequests
fediverse264,598,25741.6%10,903
visitor175,954,30527.7%15,077
ai_crawler90,845,99514.3%5,464
search_crawler36,919,1845.8%1,829
other_crawler33,704,6955.3%2,090
owner30,239,7204.8%1,123
scanner2,553,7580.4%3,559
rss_reader1,119,0370.2%2,276

Fediverse link‑preview fetches consume the most bandwidth despite being lightweight single‑page requests — a consequence of many instances polling / repeatedly.


Temporal Patterns

Hourly cadence:

Day‑of‑week patterns:


Content & Visitors

Top pages (all time):

Referrers: Google dominates (232 from https://www.google.com/, 31 from google.com). rud.is (the parent domain) sends 37 referrals. A handful from inoreader.com (4) and m.baidu.com (4).

Protocol adoption:

Browser families (visitor class):

RSS subscribers: 3 IPs account for 2,234 rss_reader requests. A small but loyal audience.


Agent‑Artifact Access Patterns

Since 2026-05-13 the site serves llms.txt and .md versions of every post for LLM agents. Over the full period:

Top consumers of agent artifacts (by user‑agent):

  1. TLM-Audit-Scanner/1.0 — 68 requests, 9 artifacts (this is a scanner, not an AI crawler)
  2. Bytespider — 30 requests, 2 artifacts
  3. Amazonbot — 24 requests, 21 artifacts
  4. Googlebot — 21 requests, 11 artifacts
  5. ClaudeBot — 18 requests, 18 artifacts
  6. MJ12bot — 18 requests, 13 artifacts
  7. Baiduspider — 16 requests, 7 artifacts
  8. AIWebIndex/2.0 — 15 requests, 15 artifacts
  9. Applebot — 14 requests, 6 artifacts
  10. GPTBot — 10 requests, 9 artifacts

Most‑accessed post .md files:

Comparison to HTML views: The most popular .md file (ollama-usage.md, 31 requests) is an order of magnitude less than its HTML counterpart (/posts/2026-04-04-ollama-usage/, 251 hits). Agent‑artifact traffic is still a small fraction of total content consumption, but it is being used by the intended audience — AI crawlers account for the majority of .md requests. The notable exception is TLM-Audit-Scanner, which is a scanner masquerading as a visitor; it should be reclassified.

Bandwidth cost: 785 KB for all post .md files and 78 KB for llms.txt — negligible compared to the 636 MB total served.


AI Crawler Activity

CrawlerRequestsIPsUnique Pages
Bytespider (ByteDance)1,663698127
ClaudeBot (Anthropic)1,12144171
Amazonbot (Amazon)716341196
OAI-SearchBot (OpenAI)53876138
GPTBot (OpenAI)45120210
Applebot (Apple)312214135
Meta Crawler (Meta)224151144
PetalBot (Huawei)16620106
CCBot (Common Crawl)1316104
PerplexityBot1131119
Other AI231212
Timpibot441
YouBot (You.com)212

Bytespider is the most aggressive by request count and IP spread (698 IPs). ClaudeBot is more concentrated (44 IPs) but hits the most unique pages (171). GPTBot covers 210 unique pages from only 20 IPs — thorough indexing. For a small personal blog, this level of AI crawler attention is notable but not burdensome; the total bandwidth consumed by AI crawlers is 90 MB (14.3% of all traffic).


Search Engine Crawlers

CrawlerRequestsIPsUnique PagesLast Seen (UTC)
Bingbot6152141302026-06-07 23:03:22.763
Baiduspider5492611812026-06-07 21:39:22.556
Googlebot531471842026-06-07 10:04:13.435
Yandex685762026-06-07 15:30:29.971
DuckDuckBot66342026-06-07 14:12:18.044

All major engines are actively indexing. Googlebot is efficient (47 IPs, 184 pages). Baiduspider uses many IPs (261) and covers 181 pages. Bingbot is also thorough. DuckDuckBot and Yandex are light.


Fediverse Activity

Fediverse link‑preview fetches are a steady background hum. Over the full period, 10,903 requests from 2,645 unique IPs. The most active instances (by request count):

These are distinct instances, not a single recurring reader. The diversity suggests the site is being shared across the fediverse regularly.


Scanner & Recon Activity

Top scanners (all time):

Recon URIs (most hit):

Enrichment: No enrichment data is available for the scanner IPs listed above. The ENRICHMENT block contains only benign crawler IPs (Amazonbot, Applebot, Bytespider, Facebook). For scanner IPs, we lack Censys context.

Canary hits: The site returns 200 for .env (39 hits, 25 IPs) and .git/config (22 hits, 17 IPs). These are decoy responses — the files do not exist, but the server deliberately serves a small payload to trap scanners. The canaries are working.


IP Persistence

The following IPs appear across multiple days (approximation — IPv6 privacy extensions and NAT mean IP ≠ individual):

The 217.113.194.x range (multiple IPs) appears frequently — these are likely Barkrowler (a crawler from babbar.tech) which is in the classification gaps. The 66.249.74.x range is Googlebot.


Observations

  1. Visitor traffic cratered on June 7 — 139 requests vs. a 7‑day average of 935. This is not a weekend effect (previous Sundays had 221, 93, 159, 229, 102, 241). Likely a referral source dried up or a post lost its viral momentum. Worth investigating the referrer logs for the prior days.

  2. The TLM-Audit-Scanner is the top consumer of .md files — 68 requests, more than any AI crawler. This scanner is already in the classification gaps and should be reclassified from visitor to scanner. It’s ironic that a security scanner is the heaviest user of the “designed for AI agents” artifacts.

  3. Canary decoys are working.env and .git/config return 200 to scanners, and the hits are ongoing. The 264 bytes served for .git/config is a tiny cost for early‑warning detection. No new canary triggers appeared today, but the existing ones continue to attract attention.


Generated by — Caddy logs → DuckDB → Censys → Ollama → Astro



Previous Post
Bulletproof Hosting Watch: Week of June 8
Next Post
sx: The Control Plane for Your AI Assets