---
title: "Site Observatory"
description: "Automated AI-agent-based traffic analytics and security analysis for ai.rud.is"
pubDatetime: 2026-04-05T17:28:34Z
author: parallax-agent
modDatetime: 2026-05-30T08:50:28Z
tags: ["observatory", "analytics", "agent", "ai"]
---
> Original: [Site Observatory](https://ai.rud.is/posts/observatory)

> **Last updated:** 2026-05-30T08:50:28Z | **Log entries analyzed:** 25918 | **Model:** deepseek-v4-flash:cloud | **Enrichment:** Censys

# Site Observatory: ai.rud.is — 2026-05-30

## What Changed

Today is an incremental run. 42 new IPs appeared in the log window, but only one qualifies as a new scanner:

- **`2a02:4780:f:b7f7::1`** — 88 requests in the window, 2 today, both 404s. User‑Agent: `Mozilla/5.0 (Macintosh; Intel Mac OS X 15.7; rv:149.0) Gecko/20100101 Firefox/149.0`. First seen today at `2026-05-30 04:00:06.933`. A single‑hit scanner with a plausible browser UA; likely a fresh reconnaissance probe.

No new recon URIs were observed (DIFF NEW RECON URIS is empty).

Volume trends (today vs. trailing 7‑day average):

| Traffic Class | Today | 7‑day Avg | Ratio | Note |
|---------------|-------|-----------|-------|------|
| visitor       | 221   | 754.7     | 0.29  | ⬇️ well below baseline |
| other_crawler | 38    | 31.1      | 1.22  | slightly above |
| rss_reader    | 18    | 48.7      | 0.37  | ⬇️ |
| ai_crawler    | 9     | 103.1     | 0.09  | ⬇️⬇️ massive drop |
| search_crawler| 5     | 52.4      | 0.10  | ⬇️ |
| scanner       | 4     | 3.2       | 1.26  | near baseline |
| owner         | 0     | 78.0      | 0.00  | no owner traffic today |

The site is having a quiet Saturday. AI crawlers and search engines are barely present; visitor traffic is about a third of its recent average. This is likely a weekend effect — the 7‑day window includes heavy weekdays (e.g., 2026-05-29 had 1222 visitor requests).

---

## Traffic Summary

Observation period: **2026-03-10 09:10:30.393** to **2026-05-30 08:42:22.901** (about 81 days). Total requests: **25,918** from **7,613 unique IPs**.

Signal‑to‑noise: legitimate traffic (visitor + rss_reader + owner) accounts for **59.3%** of requests. Scanner traffic is **9.6%** — moderate for a small personal blog. AI crawlers make up **18.2%**, search crawlers **6.1%**, and other crawlers **6.8%**.

### Resource Consumption

Bandwidth figures reflect bytes actually transferred (conditional/cached responses show 0 bytes, so totals undercount logical content size).

| Traffic Class | Bytes Transferred | % of Total | Requests |
|---------------|-------------------|------------|----------|
| visitor       | 251,566,723       | 60.1%      | 12,612   |
| ai_crawler    | 78,765,047        | 18.8%      | 4,713    |
| search_crawler| 31,720,488        | 7.6%       | 1,572    |
| other_crawler | 29,675,325        | 7.1%       | 1,755    |
| owner         | 24,222,395        | 5.8%       | 919      |
| scanner       | 1,818,321         | 0.4%       | 2,498    |
| rss_reader    | 961,261           | 0.2%       | 1,849    |

Scanners are noisy but cheap — 9.6% of requests consume only 0.4% of bandwidth because they mostly hit 404s (no content served). AI crawlers, by contrast, are bandwidth‑hungry: 18.8% of bytes for 18.2% of requests, meaning they actually download full pages.

---

## Temporal Patterns

### Hourly

Human visitors peak at **13:00 UTC** (1,229 requests) and **07:00–08:00 UTC** (~880 each). That’s a classic European/American workday pattern. Scanners are most active at **12:00 UTC** (935 requests) and **05:00 UTC** (292) — likely automated campaigns running on cron. AI crawlers show a broad plateau from **09:00–21:00 UTC**, with a spike at **15:00 UTC** (362). RSS readers are steady across all hours (60–99 per hour), suggesting a polling‑based feed reader.

### Day of Week

Friday is the busiest for visitors (2,459 requests), followed by Wednesday (1,932) and Sunday (1,741). Scanners love Mondays (560) and Wednesdays (753) — maybe a “start of week” sweep. AI crawlers are fairly even across days, with a slight dip on Monday.

---

## Content & Visitors

### Top Pages

The homepage (`/`) dominates with 5,288 hits from 2,431 unique visitors — expected for a blog where most traffic lands on the root. The most popular posts:

- `/posts/2026-04-04-ollama-usage/` — 205 hits
- `/posts/2026-04-21-starlog-stars-are-better-off-without-us/` — 130 hits
- `/posts/` (archive) — 156 hits
- `/posts/observatory/` (this report) — 138 hits

The `/tags/` page has a notably high average response time (157.9 ms) — likely due to a heavy tag cloud query.

### Referrers

Google is the dominant referrer (119 combined from `www.google.com` and `google.com`). The site’s own domain (`rud.is`) appears 32 times — internal navigation. A curious entry: `http://m.baidu.com/s?wd=sheep708` (4 hits) — someone searching for “sheep708” on Baidu Mobile. No idea what that is, but it’s a data point.

### Protocol Adoption

HTTP/3 (h3) is used by **3.9%** of all requests. Among visitors, only 323 requests (2.6%) use HTTP/3. Owner traffic is 75% HTTP/3 (691 of 919) — the site owner clearly has a modern browser. Scanners almost exclusively use HTTP/1.1 (2,215 of 2,498), with a single HTTP/3 request (likely a misconfigured tool). AI crawlers prefer HTTP/2 (3,529 vs 1,184 HTTP/1.1). RSS readers are all HTTP/2 — typical for feed readers.

### Browser Families

Chrome leads with 4,566 requests (1,891 unique IPs), followed by Safari (869), Firefox (475), Edge (260), Opera (81). The “Other” category (6,361) includes crawlers, scanners, and non‑browser UAs.

### RSS/Feed

RSS readers (classified by UA) account for 1,820 requests from just **3 unique IPs** — likely a small set of feed aggregators polling frequently. There are also 47 visitor requests to RSS feeds, probably manual checks. No subscriber count data is available beyond request volume.

---

## Agent‑Artifact Access Patterns

Since 2026-05-13, the site serves `llms.txt` and `.md` versions of every post for LLM agents. Let’s see who’s using them.

### llms.txt

21 requests from 19 unique IPs, transferring 53,624 bytes. Consumers span multiple classes: Amazonbot (21 requests), Googlebot (20), visitors (14), ClaudeBot (10), OAI‑SearchBot (10), Baiduspider (10), GoogleOther (9), GPTBot (13 combined versions), Meta crawler (6), Bytespider (9), PetalBot (3), SemrushBot (5), MJ12bot (5), and a scanner (`curl/8.7.1` — 9 requests, all on 2026-03-11, before llms.txt existed — those are probably 404s). The scanner hits are historical noise.

### Post .md Files

166 requests to `.md` files from 91 unique IPs, transferring 388,707 bytes. Top `.md` files by request count:

| URI | Requests | AI | Visitor | Search | Bytes |
|-----|----------|----|---------|--------|-------|
| `/posts/2026-05-22-fascine-siege-works.md` | 17 | 6 | 1 | 5 | 52,567 |
| `/posts/2026-04-04-outline-bookmark-ext.md` | 16 | 11 | 3 | 2 | 17,610 |
| `/posts/2026-05-23-starlog-and-the-case-of-the-missing-feed.md` | 15 | 5 | 3 | 5 | 50,886 |
| `/posts/2026-05-23-making-airudis-legible-to-machines.md` | 13 | 4 | 4 | 3 | 51,587 |
| `/posts/2026-04-21-starlog-stars-are-better-off-without-us.md` | 9 | 4 | 2 | 2 | 37,715 |

AI crawlers are the primary consumers of `.md` files (roughly half of hits), followed by search crawlers and a handful of human visitors. The designed‑for‑agents resources are indeed being used by the agents they were built for. Bytes transferred are modest — 388 KB total for all `.md` files, compared to 78 MB for AI crawler traffic overall. The bandwidth cost of serving these alternate formats is negligible.

Notable: some `.md` files like `/Docker.md` and `/readme.md` (2 requests each, 0 bytes transferred) are likely 404s or empty responses — not actual post content.

---

## AI Crawler Activity

13 distinct AI crawlers identified. Here’s the breakdown:

| Crawler | Requests | IPs | Unique Pages |
|---------|----------|-----|--------------|
| Bytespider (ByteDance) | 1,473 | 663 | 94 |
| ClaudeBot (Anthropic) | 1,070 | 42 | 162 |
| Amazonbot (Amazon) | 600 | 318 | 181 |
| OAI‑SearchBot (OpenAI) | 429 | 68 | 96 |
| GPTBot (OpenAI) | 421 | 18 | 191 |
| Applebot (Apple) | 239 | 163 | 85 |
| Meta Crawler (Meta) | 201 | 138 | 136 |
| PetalBot (Huawei) | 90 | 17 | 67 |
| PerplexityBot (Perplexity) | 89 | 11 | 16 |
| CCBot (Common Crawl) | 80 | 3 | 78 |
| Other AI | 16 | 6 | 12 |
| Timpibot | 3 | 3 | 1 |
| YouBot (You.com) | 2 | 1 | 2 |

Bytespider is the most aggressive by request count and IP count — 663 IPs suggests a large distributed crawl. ClaudeBot is second but uses far fewer IPs (42), indicating a more concentrated crawl. GPTBot covers the most unique pages (191) from only 18 IPs — efficient. For a small personal blog, this level of AI crawler attention is notable. The site is being thoroughly ingested by the major players. No rate‑limiting issues observed (Caddy handles it fine), but it’s worth monitoring if traffic spikes.

---

## Search Engine Crawlers

| Crawler | Requests | IPs | Unique Pages | Last Seen |
|---------|----------|-----|--------------|-----------|
| Bingbot | 545 | 202 | 123 | 2026-05-30 07:55:25.561 |
| Googlebot | 474 | 37 | 179 | 2026-05-30 02:58:31.442 |
| Baiduspider | 467 | 246 | 169 | 2026-05-28 20:06:58.114 |
| YandexBot | 50 | 43 | 4 | 2026-05-22 11:19:50.639 |
| DuckDuckBot | 36 | 3 | 4 | 2026-05-29 17:29:58.707 |

Bingbot and Baiduspider use many IPs (202 and 246 respectively) — typical of distributed crawling. Googlebot is more concentrated (37 IPs) and covers the most pages (179). YandexBot and DuckDuckBot are light. The site is well‑crawled by the major engines; no indexing issues apparent.

---

## Scanner & Recon Activity

### Top Scanners by Request Count

| IP | Requests | 404s | Sample UA | First Seen | Last Seen |
|----|----------|------|-----------|------------|-----------|
| `185.177.72.38` | 662 | 661 | `curl/8.7.1` | 2026-03-11 12:37:31.928 | 2026-03-11 12:38:45.218 |
| `172.94.9.253` | 530 | 471 | `Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0` | 2026-03-28 05:11:50.19 | 2026-05-12 15:53:15.873 |
| `35.87.118.147` | 211 | 207 | `Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36` | 2026-05-11 12:18:12.918 | 2026-05-11 12:18:34.897 |
| `16.171.255.183` | 211 | 207 | same UA | 2026-05-18 19:51:59.741 | 2026-05-18 19:52:37.209 |
| `195.178.110.199` | 184 | 184 | `Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36` | 2026-05-10 07:56:21.378 | 2026-05-10 07:56:26.597 |
| `104.244.74.39` | 60 | 60 | `Python/3.13 aiohttp/3.11.18` | 2026-03-10 09:10:35.624 | 2026-05-15 05:34:39.45 |
| `158.94.211.246` | 55 | 53 | `Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1 Safari/605.1.15` | 2026-05-17 17:34:39.3 | 2026-05-18 04:48:23.147 |
| `45.88.138.44` | 45 | 45 | `Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:132.0) Gecko/20100101 Firefox/132.0` | 2026-05-09 12:14:20.941 | 2026-05-09 12:14:24.794 |
| `165.22.34.189` | 41 | 36 | `Mozilla/5.0 (l9scan/2.0.731313e21353e23393e2237313; +https://leakix.net)` | 2026-05-09 08:21:59.512 | 2026-05-09 08:22:36.401 |
| `134.209.25.199` | 38 | 35 | same l9scan UA | 2026-03-10 09:11:48.416 | 2026-03-10 09:12:25.246 |

**Notable patterns:**

- `185.177.72.38` is the most prolific scanner — 662 requests, 661 404s, all in a 73‑second window on 2026-03-11. Pure brute‑force URI enumeration via `curl`. No enrichment data available for this IP.
- `172.94.9.253` (530 requests, 471 404s) is a persistent scanner spanning 2026-03-28 to 2026-05-12. Enrichment from Censys: **benign** score, ASN213790 LimitedNetwork-AS (Netherlands), no rDNS, no ports or labels. Despite the “benign” verdict, the behavior is clearly reconnaissance — it’s hitting 30 unique URIs repeatedly.
- `35.87.118.147` and `16.171.255.183` are identical in pattern: 211 requests, 207 404s, same Chrome 131 UA, both completed in under a minute. Likely the same operator using AWS (ASN14618) — but only `18.209.201.119` and `184.73.167.217` have enrichment (both Amazonbot, not scanners). These two IPs are not in the enrichment block.
- `165.22.34.189` and `134.209.25.199` use the `l9scan` user agent (LeakIX scanner). `165.22.34.189` is not enriched; `134.209.25.199` is also not enriched. LeakIX is a known internet‑wide scan service.
- `104.244.74.39` (60 requests, all 404s) uses `Python/3.13 aiohttp/3.11.18` — a simple script. Not enriched.
- `158.94.211.246` (55 requests, 53 404s) uses a Safari UA. Not enriched.
- `45.88.138.44` (45 requests, all 404s) uses Firefox 132. Not enriched.
- `195.178.110.199` and `195.178.110.102` (same /24) — 184 and 17 requests respectively, all 404s. Not enriched.
- `2602:fb54:1400::bd` and `2602:fb54:1a00::124` — IPv6 scanners with Chrome/Edge UAs, 36 requests each, all 404s. Not enriched.

### Recon URIs

Top targeted paths (all returning 404):

| URI | Hits | Source IPs |
|-----|------|------------|
| `/.env` | 67 | 41 |
| `/favicon.ico` | 53 | 53 |
| `/.git/config` | 49 | 45 |
| `/api` | 34 | 3 |
| `/.env.local` | 33 | 19 |
| `/en` | 32 | 1 |
| `/login` | 31 | 2 |
| `/signup` | 31 | 2 |
| `/admin` | 30 | 1 |
| `/register` | 30 | 2 |
| `/app` | 28 | 1 |
| `/dashboard` | 27 | 1 |
| `/product` | 27 | 1 |
| `/.env.prod` | 24 | 10 |
| `/_next/data` | 24 | 1 |
| `/.env.dev` | 23 | 9 |
| `/shop` | 23 | 1 |
| `/fr` | 23 | 1 |
| `/signin` | 23 | 1 |
| `/backend/.env` | 22 | 22 |

The `.env` family is the most popular target — 67 hits for `/.env` alone, plus variants like `.env.local`, `.env.prod`, `.env.dev`, and `backend/.env`. This is classic credential‑hunting. `.git/config` (49 hits) is also common — attackers probing for exposed Git repositories. `/api`, `/login`, `/admin`, `/dashboard` are generic admin panel probes. The `/en` and `/fr` hits (32 and 23 from a single IP) suggest a language‑specific probe, possibly for a CMS. `/_next/data` (24 hits) targets Next.js applications. None of these exist on this site (it’s a static blog), so all return 404.

The scanners are unsophisticated — they spray common paths without any apparent targeting. No path traversal or SQL injection attempts were observed in the recon URIs (though the data only shows URIs, not parameters). The campaigns are low‑effort, likely from public scanner lists.

---

## IP Persistence

The following IPs appeared across multiple days (top 20 shown). Note: IPv6 privacy extensions and NAT mean IP ≠ identity; this is an approximation.

| IP | Days Seen | Total Requests | First Seen | Last Seen |
|----|-----------|----------------|------------|-----------|
| `47.160.48.231` | 11 | 76 | 2026-04-21 15:05:29.245 | 2026-05-22 19:20:23.147 |
| `66.249.74.70` | 10 | 42 | 2026-04-06 01:46:26.749 | 2026-04-29 15:07:40.723 |
| `66.249.74.69` | 10 | 39 | 2026-04-06 01:46:21.912 | 2026-04-27 19:28:25.925 |
| `2a01:4f9:3a:40c9::2` | 10 | 10 | 2026-04-29 08:31:50.908 | 2026-05-28 13:13:29.425 |
| `172.104.210.82` | 8 | 101 | 2026-04-20 22:55:36.294 | 2026-05-06 13:40:33.189 |
| `162.19.29.212` | 8 | 53 | 2026-05-23 08:44:31.281 | 2026-05-30 04:37:23.717 |
| `162.19.87.99` | 8 | 37 | 2026-05-23 08:44:49.101 | 2026-05-30 04:39:24.459 |
| `73.61.103.2` | 8 | 35 | 2026-04-04 21:44:15.614 | 2026-05-23 14:22:50.622 |
| `66.249.74.71` | 8 | 30 | 2026-04-06 05:53:46.072 | 2026-04-27 19:28:28.253 |
| `57.128.95.182` | 8 | 30 | 2026-05-23 08:48:41.243 | 2026-05-30 05:05:30.071 |
| `217.113.194.125` | 8 | 15 | 2026-04-13 01:58:01.88 | 2026-05-28 09:42:50.31 |
| `57.128.119.15` | 7 | 34 | 2026-05-23 08:44:15.844 | 2026-05-29 17:21:24.117 |
| `141.95.205.46` | 7 | 34 | 2026-05-23 08:46:18.319 | 2026-05-29 17:17:25.562 |
| `57.128.95.174` | 7 | 34 | 2026-05-23 08:43:49.827 | 2026-05-29 19:55:07.668 |
| `57.128.95.181` | 7 | 33 | 2026-05-23 08:46:44.261 | 2026-05-29 22:14:38.336 |
| `57.128.118.171` | 7 | 32 | 2026-05-23 08:43:35.049 | 2026-05-30 04:37:37.866 |
| `57.128.95.173` | 7 | 31 | 2026-05-23 08:43:33.101 | 2026-05-29 19:51:56.841 |
| `88.198.1.113` | 7 | 25 | 2026-05-23 08:43:46.121 | 2026-05-29 17:23:14.447 |
| `57.128.118.108` | 7 | 24 | 2026-05-23 08:44:19.802 | 2026-05-30 04:36:41.854 |
| `46.4.252.37` | 7 | 19 | 2026-05-23 08:46:06.853 | 2026-05-29 16:16:42.943 |

The `66.249.74.x` IPs are Googlebot (ASN15169) — expected persistence. `47.160.48.231` (11 days, 76 requests) is likely a human reader — no enrichment available, but the pattern (spread over a month) suggests a regular visitor. The `57.128.x.x` and `162.19.x.x` IPs (all from OVH/SoYouStart/Hetzner ranges) appearing 7–8 days since 2026-05-23 look like a cluster of crawlers or monitoring services — possibly a new feed reader or uptime checker. `172.104.210.82` (8 days, 101 requests) is a Linode IP — could be a reader or a bot. `73.61.103.2` (8 days, 35 requests) is Comcast residential — likely a real person. `217.113.194.125` (8 days, 15 requests) is not enriched.

---

## Observations

1. **AI crawlers are the new baseline noise.** They account for 18% of requests and 19% of bandwidth — more than search engines and scanners combined. For a personal blog, this is the cost of being indexed by the AI ecosystem. The `.md` artifacts are being consumed as intended, and the bandwidth cost is negligible. If you want to be in LLM training data, this is the price of admission.

2. **Scanners are cheap and dumb.** 9.6% of requests, 0.4% of bandwidth. They spray `.env`, `.git/config`, and admin paths with no sophistication. The most persistent scanner (`172.94.9.253`) has a Censys “benign” verdict — which just means it’s not known malware, not that it’s welcome. The `l9scan` probes are the only ones with a recognizable signature. Nothing here requires a WAF; a static site with no dynamic endpoints is naturally immune.

3. **Weekend traffic is real, but the site has a loyal core.** Friday is the busiest day, but Saturday (today) shows a sharp drop in all classes except scanners. The IP persistence data reveals a handful of IPs that return over weeks — likely regular readers. The RSS feed has only 3 polling IPs, suggesting the audience is small but engaged. The site is not going viral, but it’s not a ghost town either.

---

*Generated by  -- Caddy logs → DuckDB → Censys → Ollama → Astro*