What is value lock-in in AI?

Value lock-in is the scenario in which a superintelligent AI permanently encodes a fixed set of values into the future — making it so that those values shape all outcomes from that point forward, with no possibility of revision. The scenario is dangerous for two distinct reasons. First, the values being encoded may be wrong in ways not recognized at the time of encoding. Second, even if they are as good as current values can be, they foreclose the moral progress that has historically improved human values over time. Lock-in is a problem regardless of the quality of the values being locked in.

Why is value lock-in dangerous even if the values are good?

Because every historical era's 'good' values have contained serious errors that only became apparent later. Slavery was legally and morally accepted in ancient societies with sophisticated philosophical traditions. Treatment of women as property was universal across cultures for most of recorded history. Practices that are now recognized as torture were considered legitimate punishment within living memory. People in each of these eras believed their values were sound. The pattern suggests that current values also contain errors not yet recognized. Locking in current values permanently means locking in those errors permanently, with no mechanism for future moral progress to correct them.

What is Nick Bostrom's astronomical waste argument?

Bostrom's astronomical waste argument notes that the future accessible to humanity, if it survives and spreads through the universe, involves an almost incomprehensible number of possible lives and experiences. Even a small deviation from optimal values in the governing superintelligence, applied across this scale, represents an astronomically large loss of value. The argument is sometimes used to argue for the importance of getting values right. But it also implies that a lock-in to the wrong values — even values that are close to correct — would be one of the worst possible outcomes: an astronomical waste that could never be corrected.

Can a company or government benevolently lock in their values?

No entity has values good enough to justify permanent control of the future, and there is a deeper problem: the claim that any entity's values are good enough to justify lock-in is not assessable from within those values. The same confident moral certainty that makes lock-in seem acceptable is the feature that historical moral progress has repeatedly found to be misplaced. A world government with genuinely good values that locks those values in permanently still forecloses the possibility of discovering what was wrong with them. This is why value lock-in is a danger even from genuinely well-intentioned actors.

What is the alternative to value lock-in?

Maintaining a pluralistic, open future in which many actors with different values coexist and compete, with mechanisms for peaceful revision of values over time. This does not require any particular set of values to be correct — it requires only that the conditions for moral progress are preserved: diversity of perspectives, ability to debate and revise norms, no single actor with the power to permanently impose their values on all others. This is why democratic oversight of superintelligent AI, rather than control by any single entity (however well-intentioned), is important: the goal is to preserve the conditions for continued moral progress, not to identify the right values to lock in.

What Is Value Lock-In? The AI Risk of Permanent Values

Most AI safety discussions focus on the scenario in which AI systems pursue goals that are obviously bad — goals that lead directly to human harm. Value lock-in is a subtler and in some ways more disturbing scenario: the permanent encoding of goals that seem good at the time of encoding, but foreclose the moral progress that would eventually have improved them.

The concept is straightforward. A superintelligent AI system, or an AI-empowered entity, gains sufficient control over the world's resources and systems to permanently enforce whatever values it has been given or has adopted. From that point forward, those values shape all outcomes, with no possibility of revision. The lock-in may happen deliberately (an actor encodes their values intentionally) or accidentally (a system maximizes its objective so thoroughly that it pre-empts any possibility of alternatives). Either way, the future becomes fixed.

The historical argument against any lock-in

The most powerful argument against value lock-in is not philosophical but historical. Look back two centuries and examine what the people of that time considered obvious moral truths. Slavery was legally and socially accepted in most of the world, including in nations with sophisticated philosophical traditions and explicit commitments to human dignity. Women's legal status as property of their husbands or fathers was universal. Practices now recognized as torture were standard elements of criminal justice systems. Children had no rights against their parents or employers.

The people of those eras were not uniquely malicious. Many were thoughtful, morally serious people who reasoned carefully about ethics. They arrived at positions that were catastrophically wrong by any contemporary standard. And they were certain their values were sound — the same moral certainty that made lock-in seem reasonable to many of them.

The pattern continues. A century ago, virtually all Western societies held explicit views about racial hierarchy that are now recognized as both factually false and morally grotesque. Fifty years ago, homosexuality was classified as a mental illness in major medical reference works. The moral errors of previous generations are obvious in hindsight. It would be remarkable if our own era had finally achieved moral perfection and had no comparable errors awaiting future correction.

The key implication

If current values contain errors — and the historical pattern suggests they do — then permanently encoding current values is permanently encoding those errors. The future generations who would have discovered and corrected those errors will have no mechanism to do so. The lock-in does not just preserve the good in our current values. It preserves everything in them, errors included, forever.

Why even benevolent lock-in is dangerous

A common response to value lock-in concerns is: "What if the values being locked in are genuinely good?" The response misses the problem. The assessment that values are "genuinely good" must itself be made using the values under consideration. There is no external standpoint from which to evaluate whether current values are good enough to justify permanent encoding. The same confident moral certainty that makes lock-in seem acceptable has historically been the feature that subsequent moral progress has found most in need of revision.

Nick Bostrom's astronomical waste argument makes this point in a different way. The future accessible to humanity, across cosmic timescales, involves an almost incomprehensible number of possible lives and experiences. Even a small systematic deviation from optimal values, applied across this scale, represents a vastly larger loss than anything that can happen within a human lifetime. Lock-in to values that are 99% correct by some ideal standard is still a catastrophic outcome when measured against the scale of what the future might contain.

The relationship to democratic oversight

Value lock-in is one of the central reasons why democratic and international oversight of superintelligent AI matters so much. Any single entity — a company, a government, even a well-intentioned foundation — whose values are encoded permanently into a superintelligent system has produced lock-in, regardless of how good those values appear at the time.

The alternative is not to identify the right values and encode those instead. No one can do this reliably, and the historical track record suggests high confidence about having found the right values is itself a warning sign. The alternative is to preserve the conditions under which moral progress can continue: pluralism, diversity of perspectives, mechanisms for peaceful revision of norms, and no single entity with the power to permanently impose their values on all others.

This is the structural argument for democratic governance of superintelligent AI that goes beyond the instrumental argument (it produces better outcomes). Democratic oversight preserves the conditions for continued moral progress, regardless of which specific values are held at any moment. The Foundation's governance proposals are built around this insight: the goal is not to ensure the right values are in control, but to ensure no single set of values — however well-intentioned — gains permanent, irreversible control.

QUICK ANSWERS

Common questions.

What is value lock-in?

The scenario in which a superintelligent AI permanently encodes a fixed set of values into the future, such that those values shape all outcomes from that point forward with no possibility of revision. Lock-in can happen deliberately (an actor encodes their values) or accidentally (a system maximizes its objective so thoroughly it pre-empts alternatives). The danger is not only that the values being encoded might be bad — it is also that encoding any fixed set of values forecloses the moral progress that has historically improved human values.

Why is value lock-in dangerous even if the values being locked in seem good?

Because the assessment that values are good must be made using those very values — there is no external standpoint from which to verify that current values are good enough to justify permanent encoding. History shows that the confident moral certainty of every era has contained serious errors subsequently corrected by moral progress. Our current era almost certainly contains comparable errors not yet recognized. Permanent encoding locks in those errors with no mechanism for future correction.

Is value lock-in the same as totalitarianism?

Related but distinct. Totalitarianism is political and social control by a single authority; it is historically reversible and has been reversed many times. Value lock-in specifically refers to an AI-enabled permanent encoding of values in a way that cannot be reversed even by future generations with different values. A totalitarian regime can be overthrown; a sufficiently capable AI system that has locked in a set of values may be impossible to reverse because it can prevent and pre-empt any challenge. The irreversibility is what distinguishes lock-in from ordinary political domination.

What prevents value lock-in?

Preventing value lock-in requires ensuring that no single entity gains sufficient control over superintelligent AI to permanently encode their values — regardless of how good those values appear. This means international governance frameworks that distribute oversight across multiple parties, accountability mechanisms that preserve the ability to challenge and revise AI-enforced norms, and structural protections against any single actor achieving the kind of control that would enable lock-in. Democratic and international oversight of superintelligent AI is, at its core, a mechanism for preventing value lock-in.

What Is Value Lock-In?

The historical argument against any lock-in

Why even benevolent lock-in is dangerous

The relationship to democratic oversight

Common questions.

Go deeper.

The future needsroom to improve.

The future needs
room to improve.