A recurring problem in AI governance is that governments have historically lacked the in-house expertise to evaluate the systems they are meant to regulate. The safety institutes exist to close that gap. They are government or government-backed bodies staffed with technical researchers whose job is to assess the capabilities and risks of frontier models — the state finally building the ability to see what the labs are building, rather than taking their word for it.
Where the network came from
The United Kingdom announced its AI Safety Institute around the Bletchley summit in November 2023, the first of its kind. The United States established its own institute at NIST shortly after. At the Seoul summit in May 2024, a wider group of governments — including Japan, Singapore, Canada, South Korea, the EU, and others — agreed to build institutes and connect them into an international network to align their work. What began as two national bodies became the seed of a cooperative system.
What the institutes actually do
- Evaluate frontier models. They test advanced systems for dangerous capabilities — including in high-stakes domains — developing the methods to measure what a model can actually do.
- Develop the science of evaluation. Rigorous, standardised testing of AI is an unsolved technical problem. The institutes are building the methodology that reliable governance requires.
- Advise governments. They give states independent technical judgment about AI risks, reducing dependence on the developers' own assessments.
- Cooperate across borders. Through the network, institutes share methods, coordinate testing, and work toward common standards — the beginnings of interoperable, international evaluation.
Why they matter for a treaty
The connection between the institutes and a future binding agreement is direct and underappreciated. Every serious proposal for an AI treaty depends on the ability to evaluate whether a system crosses a dangerous threshold and to verify compliance with safety obligations. That ability does not exist by default; it has to be built. The safety institutes are building exactly it. They are, in effect, constructing the technical organs that a verification regime would later use.
They also solve a sequencing problem. A treaty that specified evaluation requirements would be worthless if no independent body could perform the evaluations. By developing that capacity now — before the treaty exists — the institutes ensure the tools are ready when the political agreement arrives. The network is the practical groundwork on which binding governance can eventually stand, and it is being laid quietly while the higher-profile diplomacy proceeds in parallel.
The limits to be honest about
The institutes are not a substitute for binding governance, and it would be a mistake to treat them as one. Their access to models generally depends on the voluntary cooperation of developers; they cannot compel testing of a system a company declines to share. Their remits and resources vary widely between countries, and the network's coordination is still nascent and non-binding. And evaluation itself remains scientifically immature — reliably determining whether a model is dangerous is genuinely hard, and current methods have real gaps.
A treaty that requires safety evaluations is empty if no one can perform them. The safety institutes are building the eyes that any real verification regime will need. That work has to happen before the treaty, not after.
Naoto Nakada, Founder · Nakada Foundation to Save Humanity
The infrastructure of enforceable governance
The safety institutes represent a shift from talking about AI risk to building the capacity to measure it. That shift matters because governance without measurement is just aspiration. The realistic path is for the network to mature — gaining guaranteed access to frontier models through legal mandate rather than goodwill, converging on shared standards, and building the independent, well-resourced evaluation capacity that a verification regime requires. The institutes will not save the world on their own. But they are assembling the instruments without which no binding treaty could function, and strengthening them is among the most concrete and immediately useful things governments can do while the larger agreement is still being negotiated.
Common questions.
A government or government-backed body staffed with technical researchers whose job is to evaluate the capabilities and risks of frontier AI models. The UK created the first in November 2023, followed by the US at NIST. They test advanced systems for dangerous capabilities, develop the science of AI evaluation, and give governments independent technical judgment rather than relying on developers' own assessments.
A cooperative system, agreed at the May 2024 Seoul summit, linking national institutes — including those of the UK, US, Japan, Singapore, Canada, South Korea, and the EU — so they can share evaluation methods, coordinate testing, and work toward common standards. It turns isolated national bodies into the beginnings of an interoperable, international evaluation capacity.
Because every serious AI treaty proposal depends on being able to evaluate whether a system crosses a dangerous threshold and to verify compliance — and that capacity has to be built, it does not exist by default. The institutes are building exactly the technical evaluation capability a verification regime would use, ensuring the tools are ready when a political agreement arrives. They are the practical groundwork for enforceable governance.
Their access to models often depends on developers' voluntary cooperation, so they cannot always compel testing; their resources and remits vary widely between countries; the network's coordination is still non-binding and nascent; and the science of evaluation is immature, with real gaps in reliably determining whether a model is dangerous. They are essential infrastructure but not a substitute for binding governance.