What Is an AI Safety Case?

A safety case is a structured argument, backed by evidence, that a particular system is safe enough to operate in a particular setting. The idea is not new. Nuclear plants, aircraft, and medical devices are licensed on safety cases: the operator has to lay out a clear claim that the system is acceptably safe, the reasoning that supports the claim, and the evidence the reasoning rests on. A regulator then scrutinises the argument before anything runs.

Applying this to frontier AI means requiring a developer to make an affirmative argument that a model is safe to train or deploy, rather than releasing it and waiting to see. The burden shifts. Instead of others having to prove a system is dangerous, the builder has to demonstrate it is safe.

What a real AI safety case would contain

A serious safety case is more than a checklist. It is an argument with a shape, usually three parts.

A claim: this model, used in this way, does not pose an unacceptable risk of specified harms.
An argument: the reasoning connecting evidence to the claim, including how the identified risks are addressed and why the safeguards are adequate.
Evidence: the results of capability evaluations, red-teaming, security measures, and analysis that the argument depends on.

Crucially, a safety case also has to state its assumptions and where it could fail. A good one is falsifiable. It tells you what would have to be true for the conclusion to hold, so a reviewer can check whether it actually does.

Why the approach is valuable

The discipline is the point. Forcing a developer to write down why a system is safe surfaces the gaps that a confident press release hides. It is far harder to wave away a risk when you have to construct an explicit argument that addresses it. The method also fits how other high-hazard fields learned to be safe, which is not by testing after deployment but by reasoning about hazards before, and it slots naturally into the threshold logic of responsible scaling policies.

The uncomfortable part

Here is what makes AI safety cases revealing rather than reassuring. In aviation, the safety case can lean on mature science: known failure rates, understood physics, decades of data. For a frontier model, the honest safety case runs straight into how little we can currently prove.

We cannot yet demonstrate that a capable model is not deceptively aligned. We cannot rule out capabilities we did not think to test. We cannot show that behaviour observed in evaluation will hold in deployment, especially for a system that might be sandbagging. A rigorous safety case for a sufficiently advanced model would have to rest on assurances that the current science cannot supply. Which means an honest attempt to write one often produces, as its real output, a clear statement of why the system cannot yet be shown to be safe.

The value of a safety case is not that it always says yes. It is that a rigorous one is willing to say no.

Why the Foundation supports them

That failure is a feature. A governance regime built on safety cases refuses to treat inability to prove danger as permission to proceed. It puts the burden of proof on the builder, and if the builder cannot meet it, the answer is not to build, not to hope. That is exactly the inversion the Foundation argues for. Safety cases should be mandatory, independently reviewed rather than self-graded, and required before training the largest systems, not after. Made binding in that form, they become one of the more powerful tools available, which is why they feature in the wider design of our plan.