---
title: "Running Ornith Locally With OpenCode and Claude Code"
description: "DeepReinforce's Ornith-1.0 models are solid agentic coders, but the upstream Ollama modelfile breaks tool calling out of the box. Here's the two-line fix, which variant fits your RAM, and how both the 35B and 9B performed on real honeypot triage work."
pubDatetime: 2026-06-27T16:30:00Z
author: hrbrmstr
---
> Original: [Running Ornith Locally With OpenCode and Claude Code](https://ai.rud.is/posts/2026-06-27-running-ornith-locally-with-opencode-and-claude-code)

DeepReinforce dropped [Ornith-1.0](https://deep-reinforce.com/ornith_1_0.html) last week – a family of MIT-licensed agentic coding models spanning 9B dense to 397B MoE, post-trained on Gemma 4 and Qwen 3.5. The architectural claim worth paying attention to is that the model jointly learns to write its own RL scaffold rather than relying on a fixed human-designed harness. Whether that holds at the sizes most of us can actually run locally is worth testing.

I packaged two variants for Ollama – [the 35B MoE](https://ollama.com/hrbrmstr/ornith-35b-fixed) and a [9B dense version](https://ollama.com/hrbrmstr/ornith-9b-fixed) for more constrained hardware – and wired them into OpenCode and Claude Code via `ollama launch`. Getting there required a non-obvious fix that isn't documented anywhere, so this covers what broke and why, then shows both models doing real work.

## What Broke Out of the Box

Pulling the upstream Ollama modelfile and pointing it at a tool-calling agent produces this immediately:

```
{"error":{"code":400,"message":"Unable to generate parser for this template.
Automatic parser generation failed: \n------------\nWhile executing
CallExpression at line 85, column 32 in source:\n...first %}\n
{{- raise_exception('System message must be at the beginnin...\n
Error: Jinja Exception: System message must be at the beginning."}}
```

Ollama's automatic Jinja parser bails when it hits `raise_exception()` branches in the template – even when those branches are unreachable at runtime. It can't statically verify they won't fire, so it gives up. The template itself is fine, so the problem is entirely on Ollama's parser generator side.

The fix is two lines in the modelfile:

```
FROM /path/to/your/blob/sha256-ff25291b...
TEMPLATE """$(cat ornith.jinja)"""
PARSER qwen3.5
RENDERER qwen3.5
```

`PARSER` and `RENDERER` bypass auto-detection entirely and tell Ollama exactly what format to use. `qwen3.5` is the right choice because Ornith's base models are Qwen 3.5 derivatives. The template itself is verbatim from [upstream](https://huggingface.co/deepreinforce-ai/Ornith-1.0-397B/blob/main/chat_template.jinja) – no modifications needed.

With that in place, launching into your agent is straightforward:

```bash
# OpenCode
ollama launch opencode --model hrbrmstr/ornith-35b-fixed

# Claude Code
ollama launch claude --model hrbrmstr/ornith-35b-fixed
```

The 35B modelfile is at [ollama.com/hrbrmstr/ornith-35b-fixed](https://ollama.com/hrbrmstr/ornith-35b-fixed). It's 21 GB with a 256K context window. On Apple Silicon with 64 GB (or more) unified memory the whole thing loads into Metal and stays there.

## Choosing Your Variant

On Apple Silicon the rough tiers are:

- 16 GB RAM: Q4_K_M (5.2 GB) on the 9B
- 32 GB RAM: Q6_K or Q8_0 for the 9B; the 35B becomes viable if you're careful
- 64 GB RAM: Q8_0 for the 9B without question; the 35B MoE runs cleanly

The `UD-Q4_K_XL` variant (5.9 GB) is an Unsloth dynamic quant that applies higher precision selectively to sensitive layers. It's worth trying if you're RAM-constrained but want better output than straight Q4.

## A Real Task: Honeypot Triage With Censys Enrichment

Both models got identical prompts against live honeypot data via [Honeylabs](https://honeylabs.net/) with Censys enrichment tools available:

> "give me a summary of what's going on at honeylabs and enrich key IPs with Censys"

No scaffolding, no examples, no chain-of-thought prompting – just the tool definitions and the ask.

### 35B: Strategic, 30-Day Window

The 35B chose a 30-day window unprompted and fired four parallel tool calls – top attackers by IP, country, ASN, and port – before touching any enrichment. Its reasoning before calling a single tool:

> "Let me gather recent attack data across multiple dimensions in parallel: top attackers by IP over the last 30 days, attack timeline to see trends, key stats on countries, ASNs, ports targeted. Let me fire these in parallel since they're independent queries."

That's an example of the model performing actual planning vs me adding in some post-hoc narration. The parallel execution threads are visible in the tool call sequence.

After the first data pull it correctly identified which IPs warranted Censys enrichment and explained why: the Chile university IP for its sustained single-source volume (166K events over 16 days from `200.89.69.247` in AS23140), the Iran cluster (AS213790, 56 IPs, anomalous high ports 59869/25843 alongside 443/80), and the Google Cloud scanning wave (multiple `34.*` GCP IPs hammering ports 7860, 9990, 7272 – Gradio and MongoDB endpoints – on June 14 and 25).

The Iran cluster analysis held up on inspection. The stale Chrome UA string (`Mac OS X 10.6, Chrome/4.0`) appearing on traffic from `185.93.89.121` and related IPs is a real signal, and the model surfaced it correctly.

One concrete error: the final summary attributed the port pattern 7860/9990/7272 to the Universidad de Chile IP. Those are the Google Cloud GCP ports. The Chile IP's actual top ports from the raw data are 8081, 56084, 18376, 15297, 45607 – a completely different profile. It correctly identified both attacker clusters in the analysis section, then conflated them in the summary. Cross-source attribution at summary time is where it slipped.

### 9B: Tactical, 48-Hour Window

The 9B defaulted to a 48-hour window and went straight to per-IP forensics. Its approach was more verbose in the thinking trace – "I detect investigation/research intent" appears twice, which reads like the model reassuring itself – but the actual tool calls were solid. Five parallel queries on the first pass, Censys enrichment on eight IPs, specific accurate host-level detail throughout.

It caught things the 35B's broader view missed: the Bucklog SARL Kubernetes cluster (185.177.72.0/24) being "actively exploited" (more on that "miss" later) from multiple vectors, a dedicated MikroTik credential-stuffing box (45.198.224.18) hunting port 8728 while itself running CVE-laden OpenSSH 8.9p1, and 160.119.71.136 flagged by GreyNoise for CVE-2025-55182 (React Server Components RCE) scanning activity.

One meaningful failure, though. Looking at the Bucklog SARL cluster showing up in the top attackers list, the 9B concluded:

> "These are **not the attackers** – they're the *target*. The honeypot is recording them being scanned/attacked by others"

That's backwards. A honeypot records inbound connections – things hitting the sensors. If an IP is in the top attackers list, it's a source, not a target. The 9B correctly parsed the data schema but got the domain semantics wrong; it understood what the fields meant but not what the system was measuring, or from whose perspective. The 35B didn't make this mistake.

Full session transcripts for both runs are at [git.sr.ht/~hrbrmstr/gists/tree/main/item/2026/2026-06-27-ornith](https://git.sr.ht/~hrbrmstr/gists/tree/main/item/2026/2026-06-27-ornith).

## What This Tells Us

The two models aren't competing on the same axis. The 35B "thinks" in campaigns and time windows; it's the right tool when you need to orient across a month of data and decide which signals are worth pursuing. The 9B goes narrow and deep – fast, thorough on per-host detail, and usable on hardware where the 35B isn't viable.

The domain semantics failure in the 9B is something folks should take pretty seriously. On tasks where the conceptual frame matters – where you *need* the model to understand what a system is measuring and from whose perspective – smaller models are more likely to produce plausible-sounding but wrong interpretations. That's not just an Ornith-specific problem; it's a more general pattern at this parameter range, and it means you need give it more info ahead of time and also need to verify output on anything where getting the frame wrong produces confidently wrong conclusions.

Both models got tool calling right, which was the actual goal of packaging these. The `PARSER`/`RENDERER` fix is the only thing standing between the upstream modelfile and a working agent session.

## Getting The Models

```bash
# 35B MoE
ollama run hrbrmstr/ornith-35b-fixed

# then launch into your agent
ollama launch opencode --model hrbrmstr/ornith-35b-fixed
ollama launch claude --model hrbrmstr/ornith-35b-fixed
```

Both use the same template fix.

