A safety case is a structured argument, backed by evidence, that a particular system is safe enough to operate in a particular setting. The idea is not new. Nuclear plants, aircraft, and medical devices are licensed on safety cases: the operator has to lay out a clear claim that the system is acceptably safe, the reasoning that supports the claim, and the evidence the reasoning rests on. A regulator then scrutinises the argument before anything runs.

Applying this to frontier AI means requiring a developer to make an affirmative argument that a model is safe to train or deploy, rather than releasing it and waiting to see. The burden shifts. Instead of others having to prove a system is dangerous, the builder has to demonstrate it is safe.

What a real AI safety case would contain

A serious safety case is more than a checklist. It is an argument with a shape, usually three parts.

  • A claim: this model, used in this way, does not pose an unacceptable risk of specified harms.
  • An argument: the reasoning connecting evidence to the claim, including how the identified risks are addressed and why the safeguards are adequate.
  • Evidence: the results of capability evaluations, red-teaming, security measures, and analysis that the argument depends on.

Crucially, a safety case also has to state its assumptions and where it could fail. A good one is falsifiable. It tells you what would have to be true for the conclusion to hold, so a reviewer can check whether it actually does.

Why the approach is valuable

The discipline is the point. Forcing a developer to write down why a system is safe surfaces the gaps that a confident press release hides. It is far harder to wave away a risk when you have to construct an explicit argument that addresses it. The method also fits how other high-hazard fields learned to be safe, which is not by testing after deployment but by reasoning about hazards before, and it slots naturally into the threshold logic of responsible scaling policies.

The uncomfortable part

Here is what makes AI safety cases revealing rather than reassuring. In aviation, the safety case can lean on mature science: known failure rates, understood physics, decades of data. For a frontier model, the honest safety case runs straight into how little we can currently prove.

We cannot yet demonstrate that a capable model is not deceptively aligned. We cannot rule out capabilities we did not think to test. We cannot show that behaviour observed in evaluation will hold in deployment, especially for a system that might be sandbagging. A rigorous safety case for a sufficiently advanced model would have to rest on assurances that the current science cannot supply. Which means an honest attempt to write one often produces, as its real output, a clear statement of why the system cannot yet be shown to be safe.

The value of a safety case is not that it always says yes. It is that a rigorous one is willing to say no.

Why the Foundation supports them

That failure is a feature. A governance regime built on safety cases refuses to treat inability to prove danger as permission to proceed. It puts the burden of proof on the builder, and if the builder cannot meet it, the answer is not to build, not to hope. That is exactly the inversion the Foundation argues for. Safety cases should be mandatory, independently reviewed rather than self-graded, and required before training the largest systems, not after. Made binding in that form, they become one of the more powerful tools available, which is why they feature in the wider design of our plan.

Common questions.

What is an AI safety case?

An AI safety case is a structured, evidence-backed argument that a specific AI system is safe enough to develop or deploy in a specific setting. The concept comes from safety-critical industries such as aviation, nuclear power, and medical devices, where operators must argue for safety before a regulator will let them run. Applied to AI, it requires a developer to make an affirmative case that a model is safe, rather than releasing it and waiting to see what happens.

What does an AI safety case include?

A serious safety case has three parts: a claim that the model, used in a defined way, does not pose an unacceptable risk of specified harms; an argument connecting evidence to that claim, including how risks are addressed and why safeguards suffice; and the supporting evidence, such as capability evaluations, red-teaming, and security measures. It should also state its assumptions and how it could fail, so a reviewer can check whether the conclusion actually holds.

Why are AI safety cases valuable?

Because the discipline of writing an explicit argument surfaces gaps that a confident announcement hides. It is much harder to wave away a risk when you must construct reasoning that addresses it. The approach also matches how other high-hazard fields became safe, by analysing hazards before operating rather than testing after deployment, and it shifts the burden of proof onto the builder to show a system is safe rather than onto others to show it is dangerous.

Can we currently write a rigorous safety case for frontier AI?

Often not, and that is revealing. Unlike aviation, which can draw on mature science and decades of data, an honest safety case for a capable frontier model runs into how little we can prove: we cannot yet demonstrate a model is not deceptively aligned, cannot rule out untested capabilities, and cannot show that evaluation behaviour will hold in deployment. A rigorous attempt therefore often produces, as its real output, a clear statement of why the system cannot yet be shown to be safe, which is itself a reason not to proceed.