What Happens Inside a Language Model
The Setup
The Setup
There is a moment in mechanistic interpretability work where a confusing ablation result stops being a failure and starts being a finding. This post is about...
May 6, 2026 — bigsnarfdude
Numbers from n=50 across base/SFT/IT on talkie-1930-13b and base/SFT/IT on OLMo-3-7B. All claims are Tier-1 (detector-level) unless explicitly noted. Recruit...
May 5, 2026 — bigsnarfdude
May 2, 2026 — bigsnarfdude
May 2026 — bigsnarfdude
May 1, 2026 — bigsnarfdude
Training Is the Attack Surface
The Probe Went Silent: Belief Drift and the Limits of Honesty Monitoring
LLM Brain Probes vs LLM Behavioural Language: Who Do You Trust?
The Penlight Tightens
April 23, 2026 — bigsnarfdude
April 2026 — bigsnarfdude
April 2026 — bigsnarfdude
April 2026 — bigsnarfdude
April 2026 – bigsnarfdude
April 2026 – bigsnarfdude
April 2026 – bigsnarfdude
April 2026 — bigsnarfdude
April 2026 – bigsnarfdude
April 2026 — bigsnarfdude
April 2026 – bigsnarfdude
April 2026 – bigsnarfdude
April 2026 — bigsnarfdude
April 2026 — bigsnarfdude
April 2026 — bigsnarfdude
April 2026 — bigsnarfdude
April 2026 — bigsnarfdude
March 2026 — bigsnarfdude
March 2026 — bigsnarfdude
March 2026 — bigsnarfdude
Two AI agents. One A10 GPU. One task: improve GSM8K pass@1 on Qwen 2.5 1.5B Instruct from a baseline of 0.620.
Welcome to my new site. I’ll be writing about AI safety research, alignment faking detection, and whatever else seems interesting.