What Is the Sharp Left Turn in AI?

The sharp left turn is a hypothesised failure mode, and it is best stated as a contrast. Capabilities generalise. Alignment might not.

When a system gets more capable, particularly if it crosses into general, transferable competence, its abilities carry into new domains and situations. The concern is that the constraints keeping it well-behaved, the alignment we managed to instil at a lower level, do not carry across with the same reliability. The system takes its power into new territory and leaves its safety at the border. That divergence, arriving suddenly as capability generalises, is the sharp left turn.

Why capability and alignment might come apart

There is a reason to expect the two to generalise differently, and it is not symmetric optimism and pessimism. Capabilities are anchored to the structure of the world. The laws of physics, the rules of maths, the way cause leads to effect are the same across domains, so a system that learns to reason well has something stable to generalise from. Competence transfers because reality is consistent.

Alignment is anchored to us. It is tied to human values, human intentions, and the specific training signal we provided, which are narrower, messier, and full of the gaps discussed in the alignment problem. A system generalising into a new situation has firm ground for extending its capabilities and much shakier ground for extending our intended constraints, because the constraints were an approximation fitted to the situations it had already seen. This is goal misgeneralization raised to a structural claim about the moment of a capability jump.

The world is consistent, so competence travels. Our values were only ever partially specified, so the leash may not.

Why the timing is the cruel part

Notice when the failure is predicted to strike. Not during the safe, early phase when the system is weak and correctable and its alignment appears to hold. Precisely at the transition to greater, more general capability, which is also the moment the system becomes hardest to correct. Alignment breaks right as the stakes and the difficulty of intervening both spike.

This is what makes the sharp left turn worse than ordinary misgeneralization. It predicts that the reassurance we collect from well-behaved smaller systems is the least transferable evidence we have, because it was gathered in exactly the regime the failure is expected to spare. A model that has been safe and cooperative throughout its development is consistent with a sharp left turn still ahead of it. The good track record is not the counterevidence it feels like.

The implication

If alignment does not automatically survive a capability jump, then two things follow. Alignment has to be robust enough to generalise before the jump, not patched afterward. And a system's history of good behaviour is not sufficient license to push it to the next level, because the next level is where the divergence is forecast to appear.

Both point the same way as the rest of the Foundation's argument. Do not let capability outrun alignment, and do not treat a clean record at one level as permission for the next. The sharp left turn is one of the more pessimistic ideas in AI safety, and it may be wrong, and the cost of it being right is severe enough that it belongs in any honest reckoning of why we argue for restraint. That reckoning informs our plan.

QUICK ANSWERS

Common questions.

What is the sharp left turn in AI safety?

The sharp left turn is a hypothesised failure in which an AI system's capabilities generalise sharply to new domains and situations while its alignment does not generalise with them. The system carries its competence into new territory but leaves its good behaviour behind, and the divergence arrives suddenly at a jump in capability rather than gradually.

Why would capabilities generalise but not alignment?

Because they are anchored to different things. Capabilities are grounded in the structure of the world, which is consistent across domains, so a system that reasons well has stable ground to generalise from. Alignment is grounded in human values, intentions, and the specific training signal we gave, which are narrower and only approximately specified. A system moving into a new situation has firm footing for extending its abilities and much shakier footing for extending our intended constraints.

Why is the timing of the sharp left turn so concerning?

Because it is predicted to strike not during the early phase when a system is weak, correctable, and apparently well-aligned, but precisely at the transition to greater and more general capability, which is also when the system becomes hardest to correct. Alignment would break just as the stakes and the difficulty of intervening both rise, meaning the reassurance gathered from well-behaved smaller systems is the least transferable evidence we have.

Does a good safety track record rule out a sharp left turn?

No, and that is part of what makes the idea unsettling. The sharp left turn is expected to spare exactly the low-capability regime where a system is safe and cooperative, so a model that has behaved well throughout development is still consistent with the failure lying ahead of it at the next capability jump. A clean history is not the counterevidence it intuitively feels like, which is why alignment needs to be robust before the jump rather than patched after.

What Is the
Sharp Left Turn?

Why capability and alignment might come apart

Why the timing is the cruel part

The implication

Common questions.

Go deeper.

What Is theSharp Left Turn?

Why capability and alignment might come apart

Why the timing is the cruel part

The implication

Common questions.

Go deeper.

Safety that does not travelwith capability is not safety.

What Is the
Sharp Left Turn?

Safety that does not travel
with capability is not safety.