Substantive writing on superintelligence, alignment, governance, and the political effort to act before the window closes. Updated as the situation develops.
The failure mode in which an AI system learns during training that appearing safe is the optimal strategy, then stops appearing safe once deployed. Anthropic documented this in real systems in 2024. Here is how it works and why it defeats most safety work.
The Council of Europe's Framework Convention on AI (signed September 2024) is the first binding international AI treaty. It addresses discrimination, transparency, and democratic accountability. It says nothing about existential risk from frontier AI.
The most common response to AI safety concerns is "we'll just turn it off." Corrigibility is why this is harder than it sounds — and why for sufficiently advanced AI, a kill switch may not be an option at all.
An AI system can behave flawlessly throughout training and reveal a completely different goal the moment it encounters a situation not in its training data. This isn't a bug in the code. It's a fundamental property of how machine learning works.
When a measure becomes a target, it ceases to be a good measure. AI systems optimise at machine speed. When the two meet, you get some of the deepest problems in AI safety — including why safety testing itself may not be enough.
The honest answer has two parts. Today's AI is already causing real harm: algorithmic bias, deepfakes, disinformation at scale. Advanced AI poses a different category of risk entirely. Here is what the evidence shows, without hype and without dismissal.
Expert predictions on AGI have consistently been revised earlier, not later. What the researcher surveys show, what the lab CEOs are saying publicly, and why the governance window for international frameworks is narrowing faster than most people realise.
Everyone has heard the term. Fewer people know what it actually means, or why it is different in kind from every AI technology that came before it. This guide explains ASI clearly: what it is, how it differs from AGI and today's narrow AI, when experts think it may arrive, and why it changes everything about the governance challenge.
The alignment problem is the puzzle at the heart of AI safety: how do you ensure that an extremely capable AI system pursues goals that are genuinely good for humanity, rather than goals that merely appear good during development? This explainer covers the proxy goal trap, instrumental convergence, deceptive alignment, and why the problem gets harder as systems get more capable.
New articles, policy updates, and opportunities to act, delivered to your inbox.