Most stories about losing control to AI involve an event. A system conceals its intentions, slips its constraints, seizes infrastructure. There is a before and an after, and in the before, somebody could have pulled a plug.
In January 2025, six researchers published a paper arguing that the event may never arrive, and that humanity could be permanently sidelined anyway. "Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development," by Jan Kulveit, Raymond Douglas, Nora Ammann, Deger Turan, David Krueger, and David Duvenaud, describes a failure with no villain, no breakout, and no obvious moment when stopping it was still possible. It was later presented at ICML 2025 as a position paper, a sign of how far the argument traveled beyond the AI safety community that first circulated it.
The paper deserves a careful reading, because it identifies a class of risk that most safety work, including most of the work we point to on this site, is not designed to catch.
Why the world currently answers to people
The argument starts with a question so basic it rarely gets asked: why do large systems serve human interests at all? An economy has no feelings about us. Neither does a state, or a culture. To the extent they serve people, they do so because they run on people. Economies need workers and customers. States need taxpayers, soldiers, administrators, and voters. Cultures need human minds to create them, carry them, and pass them on.
That dependence gives ordinary people two kinds of grip on the machinery. The first is explicit: votes, purchases, strikes, lawsuits, resignations. The second is quieter and probably matters more. A system that needs human participation has to keep humans able and willing to participate. It has an interest, however impersonal, in our being fed, educated, healthy, and at least minimally content. The paper calls this implicit alignment, and it has held, unevenly but persistently, through every technological shift so far.
AI severs the dependence. That is the whole argument, compressed. Once machines can do the work, make the decisions, produce the content, and fight the wars, the systems that once needed us no longer do. Nothing in that sentence requires any AI to be hostile, or even particularly capable of long-term planning. It only requires AI to be useful enough that handing things over keeps making local sense.
Three systems, drifting the same direction
The paper traces the mechanism through three domains, and the pattern repeats in each.
The economy
Human leverage in the economy comes from being needed as labor and courted as customers. As AI absorbs cognitive work, the labor half of that leverage shrinks toward zero. The customer half erodes more slowly but erodes too: an increasing share of economic activity becomes machine-to-machine, firms trading with firms, models negotiating with models, with human consumption a shrinking fraction of what the system optimizes for. Income concentrates with whoever owns the AI capital. Everyone else's economic vote gets smaller.
The state
Here the paper reaches for a historical analogy that has real evidence behind it: the rentier state. Governments funded by oil rather than by taxing citizens are, on average, measurably less democratic and less responsive. They can afford to be. The citizens are not where the money comes from. A state whose revenue flows from AI-driven industry, whose administration is automated, and whose security forces need few human hands is a rentier state with respect to its entire population. It does not have to become a tyranny. It simply stops having structural reasons to care what its people want, and structural reasons are the ones that hold when goodwill runs out.
Culture
Culture sounds like the soft case, and the paper argues it may be the most dangerous. Ideas, stories, and norms have always evolved under selection pressure, but the selection ran through human minds: an idea spread because people found it worth repeating. When most content is generated, filtered, recommended, and increasingly consumed by machines, ideas start evolving under a different pressure, optimized for engagement or persuasion rather than for serving the humans who host them. Culture is also the layer where we form our sense of what is normal and what is worth wanting. If that layer is being shaped by systems with no stake in human flourishing, we may lose the ability to even want to resist the rest of the drift.
The loops that close the exits
Any one of these trends might be caught and corrected. The paper's darker point is that the three domains reinforce one another, so the corrections get harder precisely as they become more necessary.
Economic power buys political influence, so the beneficiaries of automation get better at blocking rules that would slow it. States competing for growth and military advantage court AI capital and accelerate the handover; no government wants its rivals' datacenters. AI-shaped culture normalizes each new delegation, and people raised on machine-mediated everything find it hard to picture an alternative. Each system's drift removes a brake on the others. This is the same structural trap we describe in our piece on race dynamics, operating inside societies rather than between them, and it is why the paper treats the endpoint as potentially irreversible. Correction requires levers. The levers are what is being lost.
No one has to seize power. We only have to keep handing it over, one reasonable decision at a time.
Why there is no fire alarm
Takeover scenarios, whatever their probability, at least come with a tripwire. A system caught lying about its intentions, a treacherous turn, an escape attempt: these are events, and events can trigger responses. Gradual disempowerment offers nothing so convenient. Every step is voluntary. Every step is, in isolation, defensible. The firm that automates its analysts outcompetes the one that doesn't. The agency that lets a model draft its regulations clears its backlog. The politician who uses AI-optimized messaging wins.
Anyone who declines, on principle, to hand over their piece of the machine simply loses to someone who won't decline, and the sum of all those individually sensible choices is a civilization nobody chose.
The idea has a lineage. Paul Christiano's 2019 essay "What Failure Looks Like" sketched a world that goes out with a whimper, where systems optimize proxies for what we want until the proxies are all that is left. What the 2025 paper adds is structure: named mechanisms, the rentier-state evidence, the cross-domain feedback loops, and the blunt claim that this belongs in the same category as extinction-level threats, because a permanent, unrecoverable loss of human influence over the future is an existential catastrophe whether or not anyone dies on the day it becomes irreversible.
The strongest objections
The paper has attracted serious pushback, and some of it lands.
Ownership. The most common objection: humans own the capital. If AI does the work, the returns still flow to human shareholders, so humans keep the power that matters. The reply is that ownership is not a physical fact. It is a claim, enforced by courts, registries, regulators, and ultimately states, which are exactly the institutions the paper describes drifting out of human control. A share certificate is worth what the enforcement behind it is worth. And even if ownership holds perfectly, it holds for a remarkably small number of people, which is not a rebuttal to disempowerment so much as a description of it, with the values of a tiny group locked in for everyone.
Adaptation. Societies absorbed the printing press, the factory, and the computer, and human influence survived. But each of those technologies replaced particular tasks while increasing the value of human judgment somewhere else in the system. A general substitute for human cognition is not another rung on that ladder. It is the ladder ending. Past adaptation ran through mechanisms, new jobs, new political coalitions, new cultural movements, that all depended on humans being needed somewhere. That is the premise the new technology removes.
Simultaneity. Skeptics also note that the catastrophe requires the economy, the state, and culture to fail together, and compounding three uncertain forecasts should lower the probability. Fair, except the failures are not independent. The feedback loops are the point. The same underlying variable, how much each system still needs people, drives all three, which makes the failures correlated rather than coincidental.
Where the critics are most persuasive, we think, is on timing and completeness: the paper is a framework, not a forecast, and it says little about how fast any of this runs or which institutions crack first. It identifies a direction of pressure. It does not date the arrival.
What prevention would actually take
The uncomfortable feature of this risk is that the standard safety agenda does not touch it. Alignment, in the usual sense, makes each system do what its operator intends. Every system in the gradual disempowerment story can pass that test while the aggregate still drifts, because the problem is not disobedient machines. It is a civilization reorganizing itself around machines, obediently. We have written before about why loss of control is a systems problem rather than a single-model problem; this paper is the strongest academic statement of that case.
The authors' own proposals are early-stage and they say so. Measure human influence over key systems directly, the way economies measure inflation, so decline is visible while it is still correctable. Harden the mechanisms that keep institutions answerable to people, the unglamorous machinery of democratic oversight, disclosure, and law. And develop what they call ecosystem alignment: methods for keeping civilization-scale dynamics, not just individual models, anchored to human preferences.
To that list we would add the piece the paper gestures at but does not dwell on. Everything above is undermined by competition. A country that keeps humans in its loops accepts friction its rivals refuse, which is the same collective action problem that runs through every other route to losing control, and it has the same shape of answer: binding agreements that take the most corrosive forms of the race off the table, verified well enough that restraint is not a competitive sacrifice. That is the work this Foundation exists to advance.
Gradual disempowerment is, in one sense, the hardest version of the problem, a catastrophe assembled entirely out of locally rational choices. It is also, in another sense, encouraging: unlike a treacherous turn, it happens slowly enough to see, provided someone is measuring, and slowly enough to stop, provided the levers still work when we reach for them. The paper's contribution is to say, precisely and early, which levers to watch.
Common questions.
Gradual disempowerment is the idea that AI could end meaningful human control over civilization without any takeover event. As AI replaces human labor, judgment, and participation across the economy, government, and culture, those systems stop depending on people. Systems that no longer need humans lose their built-in reasons to serve human interests, and the ordinary levers of correction, such as voting, spending, and labor, stop working. The term comes from a January 2025 paper by Jan Kulveit, Raymond Douglas, Nora Ammann, Deger Turan, David Krueger, and David Duvenaud.
A takeover scenario involves an agent: a misaligned AI system that deceives its overseers and seizes control at some identifiable moment. Gradual disempowerment needs no such agent and no such moment. Every individual AI system involved could be doing exactly what its operators want. The failure lives in the aggregate: thousands of locally sensible delegation decisions that, together, transfer the machinery of civilization to processes humans can no longer redirect. The two risks are not rivals, and a gradually disempowered society would also be easier for a misaligned system to seize.
Three come up most often. First, that humans will retain power through ownership, since people still hold the shares even if AI does the work. Second, that societies have adapted to every previous technology and will adapt again. Third, that the scenario requires many independent systems to fail at once. The paper's defenders respond that ownership is only as strong as the institutions enforcing it, which are the very institutions drifting out of human control; that past technologies replaced particular tasks rather than human participation as such; and that the failures are linked by feedback loops, so they arrive together rather than independently.
The paper argues it can be, but not by model alignment alone, because every system involved could be aligned in the narrow sense while the aggregate still drifts. Its proposed directions include actually measuring human influence over key systems the way economies measure inflation, hardening democratic and legal mechanisms that keep institutions answerable to people, and researching what it calls ecosystem alignment: keeping civilization-scale dynamics anchored to human preferences. Because competition punishes any single country that keeps humans in the loop while rivals do not, meaningful prevention also requires international coordination.