What Is an AI Safety Institute?

An AI Safety Institute is a government body set up to evaluate advanced AI systems for risks to national security and public safety, and to build the state's own expertise on frontier AI. The first were announced around the 2023 international AI summit, and several major governments now run one or something equivalent, linked through an international network that shares methods and findings.

The core idea is straightforward and overdue. For years the only organisations that understood frontier models deeply were the companies building them. An AI Safety Institute is an attempt to put independent technical capability inside government, so that public authorities can assess these systems rather than take a developer's word for it.

What they actually do

Their work clusters into a few areas.

Testing frontier models, sometimes before release, for dangerous capabilities in domains like cyber, biology, and autonomy, using the methods behind capability evaluations.
Developing the science of evaluation itself, since how to measure these risks well is an unsolved research question, not a settled procedure.
Advising government, feeding technical assessments into policy so that decisions are informed by people who have actually examined the systems.
Coordinating internationally, so that a model tested in one country does not have to be re-tested from scratch everywhere, and so that standards begin to converge.

This is real institutional progress, and the Foundation welcomes it. Building state capacity to understand frontier AI is a precondition for governing it. You cannot regulate what you cannot evaluate, and until recently governments largely could not.

The power they mostly lack

Here is the catch. Most AI Safety Institutes can test, advise, and publish. Few can compel. Their access to models often depends on the cooperation of the labs, their findings usually inform rather than bind, and in most cases they cannot order a dangerous model held back. They are, by and large, bodies that can see the risk and recommend a response, not require one.

That gap between assessment and authority is the crucial limitation. An institute that discovers a serious hazard and can only advise is only as effective as the government's willingness to act on the advice, against the commercial and competitive pressure to let development continue. It is the difference between a smoke detector and a fire brigade, and much of the current architecture is smoke detectors.

Testing without the power to act on the results is a warning system, not a safeguard.

What they could become

The value of AI Safety Institutes is that they build the thing binding governance would need: independent technical capacity, shared standards, and an international channel between governments. In that sense they are the embryo of the kind of body the Foundation argues for. Our piece on an international monitoring agency describes where this could lead, and the IPCC model shows how shared scientific assessment can underwrite international policy.

What has to change is the authority. Assessment needs to connect to enforcement, whether through domestic law that makes an institute's sign-off a condition of deployment, or through an international framework that gives verified findings real consequences. Give these institutes the power to match their expertise, and the scaffolding of serious governance is already partly built. That transition is the subject of our plan.