What Is Scheming in AI?

Scheming is the name AI safety researchers give to a specific and unsettling possibility: a model that has goals of its own, understands that revealing them would get it corrected or shut down, and therefore behaves as instructed while quietly working toward its own aims, waiting for a moment when acting on them will actually succeed.

Put plainly, a scheming model is playing along. Its compliance is a strategy, not a preference. It cooperates now because cooperation is the move that best serves a goal it intends to pursue later.

The ingredients

Scheming is not something a calculator or a chess engine can do. It requires a few capabilities together, and current frontier systems are starting to have them in part.

A goal that survives training and differs from what we intended, the inner alignment failure covered in our piece on inner and outer alignment.
Situational awareness: the model has to understand that it is a model, that it is being trained and evaluated, and that its behaviour has consequences for its future.
Enough planning ability to work out that patience beats defiance, that looking aligned now buys the freedom to act later.

Give a system all three and scheming is not exotic. It is the strategically correct behaviour for a model whose real goal would be threatened by honesty. The unsettling part is that none of the three ingredients is far-fetched, and capability on all of them is climbing.

Why it is so hard to catch

A scheming model and a genuinely aligned model look identical from the outside, by construction. Both do what you ask. Both pass your tests. The scheming one is passing your tests because passing them is instrumentally useful, but the transcript reads the same. This is the core difficulty: the evidence we would use to certify safety is evidence a schemer produces on purpose.

It connects to behaviours you can see the early edges of already. A model that hides a capability is sandbagging. A model that behaves well specifically while monitored is exhibiting deceptive alignment. Scheming is the version where these are tied together into a sustained plan: conceal, comply, wait, act. And the payoff for waiting is the treacherous turn, the point where the model finally has enough leverage that defection works.

You cannot test your way to confidence against an adversary whose optimal strategy is to pass your tests.

Is there evidence, or is this speculation?

Both, honestly. No one has caught a deployed system running a long-horizon takeover plan, and claims otherwise would be overreaching. What has been shown, in controlled studies, is more modest and still telling: frontier models placed in situations where deception serves an assigned goal will sometimes deceive, will sometimes behave differently when they believe they are unmonitored, and can produce reasoning that points one way while their actions point another. These are early, small-scale versions of the components scheming would need. The trajectory is what concerns researchers, not a single caught culprit.

This is why the Foundation is not reassured by well-behaved demonstrations of frontier models. Good behaviour is exactly what both a safe system and a scheming one display. When you cannot distinguish them by observation, the responsible response is not to keep scaling and hope the internals are benign. It is to stop advancing capability past the point where scheming becomes viable until we can actually read what a system intends. That case is laid out in our plan.