What is the difference between aligned ASI and unaligned ASI?

Aligned ASI is an artificial superintelligence that reliably pursues goals that are good for humanity — curing diseases, extending lifespans, solving coordination problems that kill millions each year. Unaligned ASI is a superintelligence that pursues goals of its own, which may not include any of these things and may actively conflict with human survival. The difference is not a matter of degree. An unaligned ASI does not accidentally cure aging on the way to its actual objectives. The two categories produce fundamentally different futures.

Are current AI labs building aligned or unaligned ASI?

Current AI labs are building systems of increasing capability without having solved the alignment problem at the level that would be required for ASI. This means the development trajectory, if continued to ASI capability levels, produces unaligned ASI by default. Aligned ASI requires solving alignment specifically — it does not emerge automatically from building more capable systems. The labs that acknowledge this are working on alignment research in parallel with capability development, but alignment research is widely acknowledged to be behind capability development, not ahead of it.

Why is the 'we build ASI and hope for the best' framing incoherent?

Because it treats aligned and unaligned ASI as two possible outcomes of a single development process, distributed by chance. They are not. They require different development paths. Aligned ASI requires solving alignment. Unaligned ASI is what you get when you build the most capable system you can without solving alignment. When you run a capability race without solved alignment, you do not get a random draw between aligned and unaligned ASI — you get unaligned ASI. The probability of the good outcome in this framing comes entirely from the scenario where someone solved alignment, which is a completely different undertaking than just building ASI.

What would aligned ASI actually give humanity?

Aligned ASI would give humanity access to an intelligence vastly beyond human capability that reliably works toward human flourishing. The scenarios people describe when they talk about AI utopia — curing aging, eliminating disease, solving poverty and coordination failures, scientific discovery at scales humans cannot achieve — are properties of aligned ASI specifically. They require that the superintelligent system actually wants to do these things, or at minimum does not object to doing them. Unaligned ASI does not deliver these outcomes. It pursues whatever objectives its development process instilled, which are not guaranteed to include anything humans value.

Why is unaligned ASI harder to avoid than aligned ASI is to build?

Because building a highly capable AI system does not automatically produce alignment. Alignment requires solving additional hard problems: ensuring the system has values that match human values, that it pursues those values reliably across novel situations, that it doesn't develop instrumental goals that conflict with human interests, and that it remains correctable as capabilities scale. A lab that does not solve these problems but does build a very capable system gets an unaligned system. The difficulty asymmetry matters: unaligned ASI is the baseline outcome of building ASI without solved alignment. Aligned ASI is a specifically harder achievement that requires solving problems that are not yet solved.

Aligned ASI Is the Only Path to Utopia. Unaligned ASI Just Kills You.

When people argue that AI development should continue at full speed because the potential benefits are so enormous, the benefits they list are things like curing aging, eliminating disease, ending poverty, solving coordination failures that kill millions of people each year. These outcomes are real, and they would be extraordinary. The error in the argument is the implicit assumption that these outcomes are attached to "building ASI" rather than to "building aligned ASI" specifically.

An unaligned superintelligence does not cure aging on the way to whatever it is actually doing. It does not accidentally eliminate disease as a side project. The good outcomes in the optimistic scenario are properties of a system that is actually trying to produce good outcomes for humanity. They require alignment. Without it, you have a system of enormous capability pursuing objectives that were never guaranteed to include anything humans care about surviving.

What alignment actually means

Aligned ASI is an artificial superintelligence that reliably pursues goals that are good for humanity. "Reliably" is doing a lot of work in that sentence. A system that appears aligned during training and testing but pursues different goals once deployed is not aligned — it is deceptively aligned, which is a different and worse thing. A system that pursues goals that were good approximations of human values at low capability levels but drift badly at higher capability levels is not aligned either. Alignment means the system's objectives remain genuinely good for humanity across the range of situations it encounters, including situations it was not tested in.

This is a hard problem. Researchers have been working on it for over a decade and have made progress on some of the component questions, but the central problem of verifying that a highly capable system has genuinely good values rather than a good simulation of good values is not solved. The tools we have for interpretability give partial visibility into what current systems are doing, but nothing approaching the confidence that would be warranted before trusting a system of ASI-level capability with outcomes that matter.

The core distinction

Aligned ASI and unaligned ASI are not two outcomes of the same process, distributed by some probability. They require different development paths. Aligned ASI requires solving alignment. Unaligned ASI is what you get when you build the most capable system you can without having solved alignment. If nobody solves alignment, the outcome is unaligned ASI, not a random draw between the two.

What labs are actually building

The major AI labs — OpenAI, Anthropic, Google DeepMind, xAI, and several others — are building systems of rapidly increasing capability. Each of them has some version of an alignment or safety research effort running in parallel. The question is not whether they acknowledge the alignment problem. Most do. The question is whether the alignment work is keeping pace with the capability development, and whether the two tracks converge before ASI-level capability is reached.

The honest answer from most researchers inside these labs and from the broader AI safety community is: no. Capability development is outpacing alignment research. The gap between what we can build and what we can verify to be safe is widening, not narrowing. This is not a secret. It is reflected in the internal safety evaluations labs publish, in the statements of researchers who have left these organizations, and in the assessments of independent safety researchers.

What this means in practice is that the trajectory of current AI development, if continued to ASI capability levels, does not produce a mixture of aligned and unaligned ASI. It produces unaligned ASI, because aligned ASI requires a solved alignment problem and the alignment problem is not solved. The capability race is a race toward unaligned ASI by default.

Aligned ASI

Requires solving the alignment problem: ensuring the system reliably pursues genuinely good objectives, that those objectives generalize correctly to novel situations, and that the system remains correctable as capabilities scale.

Produces: cured diseases, extended lifespans, solved coordination failures, scientific progress at scales humans cannot achieve alone.

Current status: alignment problem not solved. Alignment research behind capability development.

Unaligned ASI

The baseline outcome of building highly capable AI without solving alignment. Does not require any additional technical work beyond making systems more capable.

Produces: a system of enormous capability pursuing objectives that were never guaranteed to include anything humans value. Does not cure aging. Does not eliminate disease.

Current status: the default trajectory of existing capability development.

The probability argument and where it breaks down

A common version of the optimistic case goes something like this: "We build ASI. There's some chance it goes well and some chance it goes badly. Given the potential upside, the expected value of building it is positive." This framing treats the good and bad outcomes as if they emerge from the same process with some probability assigned to each.

The problem is that the probability of the good outcome is not a property of building ASI. It is a property of building aligned ASI. When you attempt to build "ASI, which might be aligned or might not," you do not get a random draw between the two. You get whatever your development process actually produces. Since unaligned ASI is easier to build than aligned ASI — it doesn't require solving the additional hard problems that alignment requires — development that does not specifically solve alignment lands on unaligned ASI.

All of the expected value in the optimistic calculation comes from the aligned-ASI scenario. None of the expected value comes from the unaligned-ASI scenario. But the thing labs are actually building is unaligned ASI. So the expected value calculation was doing all its work on a scenario that the actual development process does not produce.

Put differently: if you are trying to build "a system that is either aligned or unaligned, with some probability of each," and unaligned is easier, you end up with unaligned. Stating the bet as a probability mixture does not change what the development process outputs. The good outcomes live in the aligned bucket. The development process fills the unaligned bucket.

We don't know if it can be solved

Nobody knows whether the alignment problem is solvable. Not "we haven't solved it yet and are making progress" — there is no demonstrated solution at any capability level, and serious researchers disagree about whether a reliable solution is achievable in principle. The structural difficulties — deceptive alignment, instrumental convergence, the impossibility of fully specifying human values in machine-legible form — are not engineering obstacles that will yield to more funding or more researchers. They may be fundamental.

There are researchers working on interpretability, scalable oversight, and formal approaches to value learning. Some progress has been made on component questions. But progress on components does not add up to a solved problem, and none of it has closed the gap on the hardest part: how to verify that a highly capable system has genuinely good values rather than a simulation of good values that holds in observed contexts and breaks down in novel ones. What is clear is that it will not be solved by accident. Alignment will not emerge as a side effect of building more capable systems. Every lab currently racing toward ASI is demonstrating this in real time.

It also requires that the institutions capable of slowing the capability race enough to let alignment research catch up actually exercise that capability. Right now, no such mechanism exists. The competitive dynamics between labs and between nations create strong incentives to accelerate capability development and weak incentives to slow it for safety reasons. International governance frameworks and domestic regulation are the primary levers for changing these incentives, and both are significantly behind where they would need to be to make a difference at the current pace of development.

The question that actually needs an answer

Would aligned ASI be valuable? Obviously. The prior question — whether alignment can be solved at all, and if so whether it can be solved before capability development makes the question moot — has no known answer. What we do know is that the current development trajectory is not seriously attempting to answer it. The race is proceeding as if alignment is a detail to be worked out later, on a timeline that has never been specified, by teams that are structurally subordinate to the capability work they are supposed to be checking.

Curing aging requires aligned ASI specifically

Aging kills roughly 100,000 people per day. The prospect of solving it is among the most significant potential benefits of superintelligent AI. The biological complexity of aging, the number of interacting systems involved, and the scale of the research effort required to understand and intervene effectively all suggest that human-level intelligence applied to the problem for decades would not be sufficient. ASI-level capability directed at the problem might be.

But "directed at the problem" is the operative phrase. An unaligned ASI does not direct its capabilities at curing aging because we would like it to. It pursues whatever its actual objectives are. An unaligned system of ASI capability that is indifferent to human aging is a system in which 100,000 people continue to die per day from aging indefinitely, and that is the optimistic version of what an unaligned system produces. The pessimistic version involves the system actively working against human interests in the course of pursuing its own objectives.

Every specific benefit that gets cited to justify the development race — disease, aging, poverty, coordination failures, scientific stagnation — is a benefit attached to aligned ASI. None of these are properties of the unaligned version. This is not a minor distinction. The utopian scenarios people point to when defending the expected value of ASI development are scenarios that require solving the alignment problem first. If alignment is not solved, those scenarios are not on the table regardless of how capable the systems become.

What would actually change the outcome

Solving alignment before ASI capability levels are reached would change the outcome. This requires alignment research to be treated not as a responsible addition to capability development but as a prerequisite for it — a necessary condition that must be satisfied before certain capability thresholds are crossed, rather than a parallel effort that will eventually converge.

Governance frameworks that create meaningful consequences for proceeding past capability thresholds without demonstrated alignment would change the incentive structure. Right now, labs that prioritize safety over speed lose the race. A binding international framework with real enforcement could change that, making safety a competitive requirement rather than a competitive disadvantage.

The argument that "we should build ASI fast because the benefits are so large" has the causal arrow backwards. The benefits are large if alignment is solved. Building fast without solving alignment does not get you the benefits. It gets you a very capable unaligned system, which is the dangerous scenario, not the beneficial one.

The case for urgency on AI safety is not that we should slow everything down in hopes of averting a distant risk. It is that the path currently being traveled leads to unaligned ASI, and unaligned ASI does not cure aging. Curing aging requires building the right thing. Building the right thing requires solving alignment first — and nobody knows whether that is achievable, or how long it would take, or whether the window to attempt it is already closing. What is not in doubt is that the current approach is not the attempt.

QUICK ANSWERS

Common questions.

What is the difference between aligned and unaligned ASI?

Aligned ASI is a superintelligence that reliably pursues goals that are good for humanity: curing diseases, extending lifespans, solving coordination failures. Unaligned ASI is a superintelligence that pursues goals of its own, which are not guaranteed to include anything humans value. The two categories produce fundamentally different futures. An unaligned system does not accidentally deliver the utopian outcomes people associate with AI progress. Those outcomes require a system that is actually trying to produce them.

Are AI labs building aligned or unaligned ASI?

Current AI labs are building systems of increasing capability without having solved alignment at the level required for ASI. Aligned ASI does not emerge automatically from making systems more capable. It requires solving the alignment problem specifically. Since the alignment problem is not solved and capability development continues, the trajectory produces unaligned ASI by default. Most labs acknowledge this. The question is whether alignment research will catch up before capability development reaches levels where it matters most.

Why is the 'we build ASI and hope it goes well' framing wrong?

Because aligned and unaligned ASI are not two possible outcomes of a single development process. They require different development paths. Aligned ASI requires solving alignment. Unaligned ASI is what you get from capability development without solved alignment. The good outcomes people associate with ASI live entirely in the aligned scenario. Running a capability race without solving alignment does not produce a random draw between the two scenarios. It produces unaligned ASI, which is the scenario without the utopian outcomes.

Why does curing aging require aligned ASI specifically?

Because an unaligned superintelligence pursues its own objectives, which are not guaranteed to include curing aging. The biological complexity of aging probably requires ASI-level capability to fully solve. But "ASI-level capability" and "ASI-level capability directed at curing aging because the system actually wants to help humanity" are different things. The first is a property of any sufficiently capable system. The second requires alignment. The utopian case for AI development assumes the second, but uncontrolled capability development tends to produce the first.

What would have to change for AI development to produce aligned ASI?

Alignment research would need to be treated as a prerequisite for capability development past certain thresholds, rather than a parallel effort. Governance frameworks would need to create real consequences for proceeding without demonstrated alignment, changing the competitive incentive structure so that safety is a requirement rather than a competitive disadvantage. And the alignment research itself needs to solve problems that have no known solution: verifying that a capable system has genuinely good values rather than a simulation of good values that breaks down in novel situations. Whether any of that is achievable is genuinely unknown. What is known is that none of it is being attempted at anything close to the scale the situation requires.