MAIM: Mutual Assured AI Malfunction, Explained

"Superintelligence Strategy" arrived in March 2025 with an author list built to be taken seriously in Washington: Dan Hendrycks, director of the Center for AI Safety; Eric Schmidt, the former CEO of Google; and Alexandr Wang, the founder of Scale AI. Its purpose was to drag superintelligence out of the research literature and into the language of national security, and by that measure it succeeded. Within weeks it was being debated by arms control scholars, wargamed in think tanks, and cited in congressional testimony.

At its center is a single acronym, chosen with care. MAIM, for Mutual Assured AI Malfunction, is a deliberate echo of MAD, and it carries the paper's core claim: that the terrifying standoff which kept nuclear powers from annihilating each other has a counterpart in AI, one that already exists whether anyone acknowledges it or not.

We think the paper gets something genuinely important right, and we think its central mechanism cannot bear the weight placed on it. Both halves matter.

The standoff the paper says already exists

The argument runs like this. Suppose one state appears to be on the verge of a decisive breakthrough, a system that would grant, in the paper's framing, a superweapon and command of the future. No rival can afford to let that happen. A world where one capital controls superintelligence is, for every other capital, intolerable in the way a first-strike advantage was intolerable during the Cold War.

But unlike a deployed nuclear arsenal, an AI project in progress is vulnerable. It lives in datacenters with known coordinates, runs on power grids, and depends on months-long training runs that a well-placed intrusion can quietly corrupt. So a threatened rival has options short of war, and a ladder of them: espionage, covert cyberattacks that degrade a training run or poison its data, sabotage of the surrounding infrastructure, and, at the extreme rung, physical strikes on the facilities themselves.

The paper's claim is that this is a description, one the authors argue already captures the strategic reality facing AI superpowers. Any state that makes an aggressive, visible bid for AI dominance should expect its project to malfunction, courtesy of its rivals. Hence the name, and hence the paradoxical hope: if every player knows a sprint for dominance will be sabotaged, no one sprints, and the most destabilizing move in the game is deterred. As with MAD, the authors argue the standoff can be stabilized through deliberate practice: siting datacenters away from cities to make the extreme rungs less catastrophic, keeping the cyber rungs of the escalation ladder clearly separated from the kinetic ones, and expanding transparency between rivals so that no one mistakes a commercial project for a dominance bid.

The other two pillars

MAIM got the headlines, but the strategy has three parts, and the second is the one we find most useful. Nonproliferation takes the machinery built to keep fissile material away from terrorists and applies it to AI: track advanced chips the way enriched uranium is tracked, secure model weights against theft, and build safeguards into systems so they refuse to help with bioweapon design or critical infrastructure attacks. This pillar assumes states cooperating on compute governance even while they compete on everything else, exactly as Washington and Moscow ran nonproliferation together through the worst years of the Cold War.

Competitiveness is the realist tithe: states should strengthen their economies and militaries with AI, and above all repair fragile chip supply chains through domestic manufacturing, a concern that has since hardened into the export-control politics now shaping the industry.

The paper positions this triad against two strategies it rejects: the hands-off race to build superintelligence first and hope, and what it dismisses as an unworkable global moratorium. Deterrence, in this telling, is the grown-up option between recklessness and naivety.

What the paper gets right

Start with the achievement. For years, the standard case for racing has been that a rival's superintelligence would be catastrophic, so we must get there first. "Superintelligence Strategy" quietly inverts this. If a unilateral bid for dominance is intolerable to rivals, then everyone's bid is intolerable to someone, and the sprint itself becomes the threat to national security rather than the answer to it. Coming from authors of this profile, that inversion did real work. It put loss of control and strategic instability on the desks of people who would never read an alignment paper, and it established, in respectable language, that restraint can be enforced rather than merely requested.

The nonproliferation pillar, meanwhile, is simply a piece of the treaty architecture we advocate, arrived at from a different direction. Chip tracking, weight security, and hardware-enabled governance are dual-purpose: they serve a deterrence regime and a verification regime equally well. Every dollar spent building them is a dollar spent making a future agreement checkable.

Where the nuclear analogy strains

The trouble begins when you ask what made MAD stable and check each element against its AI counterpart.

A nuclear launch is unambiguous. It is attributable within minutes, its origin known to the meter, and the response it triggers is certain and understood by everyone in advance. Now run the comparison. A "destabilizing" training run looks, from outside, like a commercial one; the same clusters, the same power draw, the same satellite signature. The threshold is undefined, and not for lack of drafting effort but because capability does not announce itself the way a missile plume does. This is the problem we have called defining dangerous AI, and MAIM inherits it whole: a deterrence regime needs a red line, and the paper cannot say precisely where the line is.

Attribution fails in the other direction too. When a training run collapses, was it sabotage, a bug, or bad data? A state can be attacked without knowing it, which means the attack deters nothing, or believe itself attacked when it wasn't, which is worse. Deterrence works through visible, credible, attributable threats. Covert cyber operations are none of the three. Analysts at MIRI, RAND, and elsewhere pressed exactly these points within weeks of publication, and the observability critique in particular has never been satisfactorily answered: several researchers concluded that the conditions for stable deterrence simply do not hold yet, and might require deep, mutual transparency into rivals' datacenters before they could.

There is also the escalation ladder itself. Normalizing attacks on the critical infrastructure of nuclear-armed states, as routine statecraft, is not obviously a recipe for stability. The paper's answer, keep the cyber rungs well below the kinetic ones, assumes the target of a sabotage campaign shares your reading of which rung you are standing on. Cold War history is a catalog of moments when such readings diverged, and it offers another warning the paper does not dwell on: MAD nearly failed several times, through accident, false alarm, and misjudgment, and was rescued by luck as much as by doctrine. Adopting its logic means adopting its failure modes.

Deterrence needs an unambiguous tripwire and an unmistakable response. MAIM, so far, has neither.

What MAIM cannot do even if it works

Grant the framework everything, and a deeper limit remains. MAIM deters one specific failure: a state's deliberate, visible sprint for unilateral dominance. It does nothing about the failure mode that worries us most, because that one has no aggressor to deter.

Misalignment does not care which flag flies over the datacenter. A world of rival programs, each carefully staying below the sabotage threshold, is still a world racing toward systems nobody can control, with the race's usual pressure to cut corners on safety fully intact. Worse, MAIM adds a pressure of its own: projects under threat of sabotage have every reason to hide, harden, and disperse, and secrecy is the enemy of every safety practice worth having. Deterrence polices the competitors while the control problem itself goes unpoliced, and the slower, quieter routes to catastrophe, including the gradual disempowerment that requires no dominance bid at all, pass beneath its radar entirely.

A standoff is also not a resting state. MAD was tolerable because deterring the use of an existing weapon is a bounded task. MAIM must deter the development of a moving capability, forever, with thresholds that shift every year as algorithms grow more efficient and the compute frontier decentralizes. Perpetual crisis management is not a strategy. It is a countdown with good branding.

Deterrence and treaties are not rivals

Here is the irony in the paper's own analogy. Nuclear deterrence never stood alone. It was made survivable by the unglamorous machinery bolted onto it over decades: hotlines, inspections, the NPT, SALT, START, the verification protocols that let each side count the other's warheads. MAD supplied the fear; arms control converted the fear into rules before the fear could convert itself into war. Invoking the first half of that history while waving off the second, as the paper does when it dismisses coordinated restraint as utopian, is quoting the Cold War selectively.

And the two halves share an engine. To threaten sabotage credibly, a state must see into its rivals' AI programs deeply enough to know when a dominance bid is underway. That surveillance, the satellites, the chip tracking, the intelligence on training runs, is most of what a treaty on superintelligence needs for verification. The paper, read against its own intentions, is an argument that the hard part of a treaty is already being built.

So our verdict is narrower than either the paper's fans or its harshest critics would write. MAIM is valuable as a description: it names the standoff correctly, establishes that unilateral sprints will be resisted, and tells defense establishments that the race is a threat rather than a solution. It fails as a destination, because a permanent, undefined, covert standoff between nuclear powers is not stability, and because it leaves the actual source of existential risk, the systems themselves, ungoverned. Deterrence buys time. A verified international agreement is what the time is for.

QUICK ANSWERS

Common questions.

What is MAIM (Mutual Assured AI Malfunction)?

MAIM, short for Mutual Assured AI Malfunction, is a proposed deterrence regime for advanced AI, introduced in the March 2025 paper Superintelligence Strategy by Dan Hendrycks, Eric Schmidt, and Alexandr Wang. The idea is that any state making an aggressive bid for unilateral AI dominance should expect rivals to sabotage the project, through means ranging from espionage and cyberattacks on training runs up to, in extreme cases, physical strikes on datacenters. The authors argue this standoff already describes the strategic reality between AI superpowers, and that, like nuclear deterrence, it can be managed to produce stability.

Who proposed MAIM and what else does Superintelligence Strategy say?

The framework comes from Dan Hendrycks, director of the Center for AI Safety, former Google CEO Eric Schmidt, and Scale AI founder Alexandr Wang, in a paper released in March 2025. MAIM is one of three pillars. The second is nonproliferation: keeping weapons-capable AI away from rogue actors by tracking advanced chips the way fissile material is tracked, securing model weights, and building in safeguards. The third is competitiveness: states strengthening their economies and militaries with AI, including domestic chip manufacturing. The paper explicitly rejects both a hands-off race and a global moratorium, positioning deterrence as the realist middle path.

How is MAIM different from MAD?

Mutual assured destruction deterred the use of a finished, observable weapon. A nuclear launch is unambiguous, attributable within minutes, and answered by a certain response. MAIM tries to deter the development of a capability, and every element is blurrier: a destabilizing training run looks like an ordinary one from outside, sabotage can be denied or misattributed, and there is no shared definition of the threshold that triggers a response. Critics, including researchers at MIRI and elsewhere, argue these observability and credibility gaps mean the conditions that made MAD stable do not yet hold for MAIM.

Does MAIM make an AI treaty unnecessary?

No, and the comparison the paper itself invokes shows why. Nuclear deterrence never stood alone: it was stabilized by decades of arms control, hotlines, inspections, and nonproliferation agreements that defined thresholds and built verification. MAIM deters one failure mode, a unilateral sprint to dominance, but does nothing about the collective race toward systems nobody can control, since misalignment does not care which flag flies over the datacenter. Deterrence describes the standoff; a verified treaty is the exit from it. The surveillance a state needs to credibly threaten sabotage is much of the surveillance a treaty needs for verification.

MAIM: The Cold War Logic
Now Aimed at Superintelligence

The standoff the paper says already exists

The other two pillars

What the paper gets right

Where the nuclear analogy strains

What MAIM cannot do even if it works

Deterrence and treaties are not rivals

Common questions.

Go deeper.

MAIM: The Cold War LogicNow Aimed at Superintelligence

The standoff the paper says already exists

The other two pillars

What the paper gets right

Where the nuclear analogy strains

What MAIM cannot do even if it works

Deterrence and treaties are not rivals

Common questions.

Go deeper.

Deterrence buys time.A treaty is what you buy with it.

MAIM: The Cold War Logic
Now Aimed at Superintelligence

Deterrence buys time.
A treaty is what you buy with it.