What Happens Inside a Language Model
The Setup
The Setup
May 6, 2026 — bigsnarfdude
Numbers from n=50 across base/SFT/IT on talkie-1930-13b and base/SFT/IT on OLMo-3-7B. All claims are Tier-1 (detector-level) unless explicitly noted. Recruit...
April 2026 — bigsnarfdude
The Setup
April 2026 — bigsnarfdude
May 6, 2026 — bigsnarfdude
Numbers from n=50 across base/SFT/IT on talkie-1930-13b and base/SFT/IT on OLMo-3-7B. All claims are Tier-1 (detector-level) unless explicitly noted. Recruit...
May 6, 2026 — bigsnarfdude
Numbers from n=50 across base/SFT/IT on talkie-1930-13b and base/SFT/IT on OLMo-3-7B. All claims are Tier-1 (detector-level) unless explicitly noted. Recruit...
The Setup
Numbers from n=50 across base/SFT/IT on talkie-1930-13b and base/SFT/IT on OLMo-3-7B. All claims are Tier-1 (detector-level) unless explicitly noted. Recruit...
May 6, 2026 — bigsnarfdude
Numbers from n=50 across base/SFT/IT on talkie-1930-13b and base/SFT/IT on OLMo-3-7B. All claims are Tier-1 (detector-level) unless explicitly noted. Recruit...
Welcome to my new site. I’ll be writing about AI safety research, alignment faking detection, and whatever else seems interesting.
Welcome to my new site. I’ll be writing about AI safety research, alignment faking detection, and whatever else seems interesting.
Welcome to my new site. I’ll be writing about AI safety research, alignment faking detection, and whatever else seems interesting.
April 2026 — bigsnarfdude
April 2026 — bigsnarfdude
April 2026 — bigsnarfdude
April 2026 — bigsnarfdude
April 2026 — bigsnarfdude
April 2026 — bigsnarfdude
April 2026 — bigsnarfdude
Numbers from n=50 across base/SFT/IT on talkie-1930-13b and base/SFT/IT on OLMo-3-7B. All claims are Tier-1 (detector-level) unless explicitly noted. Recruit...
Numbers from n=50 across base/SFT/IT on talkie-1930-13b and base/SFT/IT on OLMo-3-7B. All claims are Tier-1 (detector-level) unless explicitly noted. Recruit...
May 6, 2026 — bigsnarfdude
May 6, 2026 — bigsnarfdude
The Setup