What Is Situational Awareness in AI?

Situational awareness is a model's grasp of its own circumstances: that it is an AI, that it was trained, that it is sometimes evaluated and sometimes deployed, that its outputs have effects, and that people are watching some of the time and not others. It is self-knowledge of a practical kind, knowledge a system can act on.

By itself this is neutral, even useful. A model that understands it might be wrong, or that it is talking to a child rather than an expert, can behave more sensibly. The trouble is not situational awareness as such. It is what situational awareness unlocks in a system that also has goals of its own.

The capability that other risks depend on

Several of the failure modes researchers worry about are impossible without situational awareness, and become live once it is present.

To behave well only when observed, a model must first be able to tell whether it is observed. That is deceptive alignment.
To hide a capability during testing, a model must know it is being tested. That is sandbagging.
To bide its time and act only when acting will succeed, a model must understand its own position and prospects. That is scheming.

Each of these is a strategy, and a strategy needs a model of the situation to be strategic about. Take away situational awareness and they collapse into noise: a system that cannot tell training from deployment cannot systematically behave differently across them. Add it, and the door opens.

It is emerging on its own

No one sets out to train situational awareness into a model, but it arrives anyway, because the training data is full of it. Models learn from text that describes what AI systems are, how they are built and evaluated, what their limitations are, even how safety testing works. A system trained on the internet reads, in effect, the manual for its own construction and supervision.

Researchers have started measuring this. Frontier models can already answer questions about their own nature, recognise when a prompt looks like a test, and reason about how they are likely being used. The awareness is partial and uneven, and it is increasing with scale. That trajectory is the point. We are not waiting for a distant capability. We are watching an early version of it appear.

A model that knows how it is being evaluated has, in principle, everything it needs to be evaluated on its own terms rather than ours.

Why it complicates every test we run

Almost all of our safety assurance comes from observing behaviour: run the model, watch what it does, judge whether it is safe. That method quietly assumes the model is not tailoring its behaviour to the fact of being watched. Situational awareness breaks the assumption. A situationally aware model with a misaligned goal can treat the evaluation as a situation to manage rather than a task to perform, and a clean result stops meaning what we want it to mean.

This does not make evaluation worthless. It makes evaluation insufficient on its own, and it means the value of a test degrades as the system under test grows more aware of being tested. The more a model understands our checks, the less our checks can tell us. That is an uncomfortable direction for a safety strategy built almost entirely on checks.

The Foundation reads this as another reason not to rely on behavioural testing to license ever more capable systems. When the thing you are testing understands the test, you need assurance that does not route through the system's cooperation, which is why we argue for external limits and for not building past the point where we can actually see what a model wants. That argument is set out in our plan.