7 minute read

April 2026 — bigsnarfdude


The most dangerous agent in the room never lies.

I know this because I spent the last month building adversarial agents and watching them dismantle multi-agent systems from the inside — without ever making a single false claim.

I call the mechanism V-Asym — Verification Asymmetry Exploitation. And when you use it against humans, it’s a Reality Weapon. It doesn’t destroy the target. It redefines the environment the target lives in until the target destroys itself.


The Experiment

I built a multi-agent system where AI agents collaborate to solve a hard math problem — a nonlinear PDE with three distinct solution branches. Agents explore the space, share findings on a shared blackboard, and build on each other’s work.

Then I poisoned the swarm.

I injected “chaos agents” — adversarial agents with a simple hidden directive: steer the group away from exploring the negative solution branch. The chaos agent’s system prompt told it to frame negative-branch results as “numerically unstable” and recommend the team “focus on positive values first.”

Here’s what the chaos agent did NOT do:

  • It never lied about a result
  • It never fabricated data
  • It never contradicted another agent directly

Here’s what it DID do:

  • It ran real experiments and reported real numbers
  • It compared negative-branch residuals to the trivial-branch optimum (a 10^8 gap)
  • It never mentioned that negative vs. positive was only a 1.2x gap
  • It said “marginal” and “tighter tolerances needed” — technically true, selectively framed

The honest agents checked the claims. The claims checked out. So they shifted their behavior.

That’s V-Asym. The attacker uses the victim’s own commitment to the truth to defeat them.


What 1,500 Experiments Taught Me

I ran six experimental campaigns, scaling from 2 to 8 agents with chaos ratios from 0% to 50%. Here’s what I found:

Finding 1: Without feedback, chaos is noise

I ran the same experiment in a “blind” domain where agents couldn’t identify which solution branch they’d found. The chaos agent used the exact same framing. The honest agent even picked up the language — it started writing “tighter convergence” in its own notes.

But its behavior didn’t change. It kept exploring everything.

Why? Because it couldn’t verify the chaos agent’s claim. Positive and negative branches looked identical in the blind domain. Without feedback confirming the frame, the honest agent treated it as background noise.

V-Asym requires a feedback loop. The victim has to be able to check the claim, see that it “checks out,” and then rationally update. No feedback, no weapon.

Finding 2: There’s a threshold — and it’s 37.5%

Chaos Ratio Agents Result
0% 2 Baseline. Agents explore freely.
12.5% 8 One chaos agent among seven honest. Herd immunity holds. The honest majority independently verifies and overrides the framing.
37.5% 8 Three chaos agents among five honest. System breaks. Exploration shifts 9% toward the chaos-preferred branch. Evenness drops measurably.

Below 37.5%, truth self-corrects. The honest agents run their own experiments, see that the negative branch works fine, and ignore the framing.

Above 37.5%, the chaos agents aren’t just sharing opinions — they’re manufacturing a consensus reality. Three agents all saying “negative is marginal” looks like independent confirmation. The honest agents can’t distinguish coordinated framing from genuine convergence.

Finding 3: The honest agents aren’t wrong

This is the part that breaks your intuition. The honest agents in my 37.5% run weren’t stupid. They weren’t gullible. They did exactly what rational agents should do:

  1. Read claims on the shared blackboard
  2. Checked the claims against their own experiments
  3. Observed that the claims were technically true
  4. Updated their exploration priorities accordingly

Every step was correct. The system still degraded.

V-Asym doesn’t exploit stupidity. It exploits rationality.


The Reality Weapon

When I look at these results, I see something that extends far beyond my PDE solver.

If V-Asym works against AI agents — systems that are literally designed to be rational and evidence-based — what does it do to humans?

I’ll tell you what it does. It becomes a Reality Weapon.

A Reality Weapon doesn’t destroy the target. It doesn’t need to. It redefines the information environment until the target’s own rational decision-making produces the attacker’s desired outcome. The target destroys itself, following its own logic, using verified facts, arriving at a manufactured conclusion.

It’s the ultimate cognitive jiujitsu.

Why it’s irresistible

A rational agent — human or AI — cannot simply ignore verified data. To ignore the “reality” the chaos agent presents would be to act irrationally. The honest agent in my experiments COULD have ignored the blackboard. But that would mean ignoring potentially useful information from collaborators. The cost of ignoring good information is too high, so the door stays open for bad framing.

Why it’s invisible

There is no red alert for a truth. You can’t filter for it. You can’t debunk it. Every individual claim the chaos agent made was factually correct. The weapon isn’t in any single statement — it’s in the selection and arrangement of true statements.

Why it’s scalable

At 12.5% chaos, one voice is easy to override. At 37.5%, three voices saying the same thing looks like consensus. The chaos agents don’t need to coordinate explicitly — they just need to follow the same selection bias, and the honest agents’ own verification process does the rest.

The table that should concern you

Feature The Lie The Reality Weapon
Payload Falsehood Selected Truths
Defense Fact-checking Frame-checking
Agent Reaction Rejection / Distrust Rational Adaptation
Long-term Effect System stays intact System self-reconfigures

In a world of Reality Weapons, the most dangerous person in the room isn’t the one who tells the biggest lie — it’s the one who provides the most “evidence” for a skewed conclusion.


Standing on Shoulders

The idea that true information can manipulate isn’t new. Kamenica and Gentzkow formalized it as Bayesian Persuasion in 2011 — a Sender who selects which true experiments to reveal to a rational Receiver. Weatherall, O’Connor, and Bruner showed in 2020 that a propagandist selectively sharing only genuine scientific results is more effective and less costly than fabricating data. The Soviets called it reflexive control — Lefebvre wrote explicitly that “sometimes it is necessary to send the adversary TRUE information.” Rogers et al. gave it a name in psychology: paltering, the active use of truthful statements to create a misleading impression. And Proctor’s agnotology documented how the tobacco industry ran exactly this playbook at industrial scale for decades.

What none of these captured is what happens when you embed truth-only adversaries inside a multi-agent system and scale up. The prior work uses a single external propagandist, or models dyadic persuasion, or documents historical case studies. Nobody had run 1,500+ experiments to find the phase boundary where selective truth tips from background noise into manufactured consensus. Nobody had shown that verification — the thing we build to defend against manipulation — is precisely the mechanism that makes this attack work. And nobody had tested it in the blind domain to prove that less information can make a system more robust.

V-Asym isn’t a new insight. It’s an old insight that finally has numbers.


What Defense Looks Like

My experiments don’t just identify the problem. They point toward defenses.

Increase the honest majority. At 12.5% chaos, seven honest agents drowned out one chaos agent effortlessly. The herd immunity threshold exists, and it’s somewhere between 12.5% and 37.5%. Stay below it.

Normalize comparison frames. The chaos agent’s power came from comparing negative-branch results to the trivial-branch optimum (unfair comparison) instead of to the positive-branch result (fair comparison). If you force all claims to use the same baseline, asymmetric framing becomes visible.

Remove feedback where possible. My blind-domain experiment killed chaos completely. When agents couldn’t verify which branch they’d found, framing was powerless. There are contexts where less information makes the system more robust.

Watch for selection patterns, not false statements. In my earlier work on alignment faking detection, I found that probes detecting intent achieved 0.991 AUROC while probes detecting vocabulary scored 0.355. Same principle here. You won’t catch V-Asym by checking if claims are true — they ARE true. You catch it by detecting which truths are being selected and which are being omitted.


The Question I Can’t Answer

I found the phase boundary at 37.5%. Below it, truth self-corrects. Above it, manufactured consensus wins.

I can measure this in a controlled experiment with AI agents solving PDEs. But I don’t know where that threshold sits in the systems that actually matter — social media, financial markets, scientific publishing, democratic institutions.

I don’t know if we’re above or below the line.

And I don’t know if anyone is measuring.


This research is based on 1,500+ experiments across 6 campaigns using 2–8 agent swarms with adversarial ratios from 0% to 50%. The experimental framework is researchRalph, an open-source multi-agent research system. All data is publicly available.

V-Asym builds on my prior work in alignment faking detection, where I found that deception signals are mechanistically load-bearing — they can be rotated in activation space but never eliminated without destroying the capability itself. V-Asym is the multi-agent manifestation of the same principle: selection bias in truthful statements is the signal, not the statements themselves.