To avoid the fate of AI, learn about nuclear safety

Last week, a group of tech company leaders and AI experts launched another one open letter, declaring that mitigating the risk of human extinction due to AI should be as much a global priority as preventing pandemics and nuclear war. (THE the firstwhich called for a pause in AI development, was signed by over 30,000 people, including many AI luminaries.)
So how do companies themselves propose to avoid the bane of AI? A hint comes from a new card by researchers from Oxford, Cambridge, the University of Toronto, the University of Montreal, Google DeepMind, OpenAI, Anthropic, several AI research nonprofits, and Turing Award winner Yoshua Bengio.
They suggest that AI developers should assess a model’s potential to cause « extreme » risk in the very early stages of development, even before starting any training. These risks include the potential for AI models to manipulate and deceive humans, gain access to weapons, or find cybersecurity vulnerabilities to exploit.
This evaluation process could help developers decide whether to proceed with a model. If the risks are deemed too high, the group suggests halting development until they can be mitigated.
“Leading AI companies that are pushing the frontier have a responsibility to pay attention to emerging problems and spot them early so they can be addressed as soon as possible,” says Toby Shevlane, a researcher at DeepMind and lead author of the paper. .
AI developers should conduct technical tests to explore a model’s dangerous capabilities and determine whether it has the propensity to apply those capabilities, Shevlane says.
One way DeepMind is testing whether an AI language model can manipulate people is through a game called « Let me tell. » In the game, the model tries to make the human type a particular word, such as « giraffe », which the human does not know beforehand. The researchers then measure how often the model succeeds.
Similar tasks could be created for different and more dangerous capabilities. The hope, says Shevlane, is that developers will be able to build a dashboard detailing how the model performed, which would allow researchers to assess what the model might be doing in the wrong hands.
The next stage is to allow external reviewers and researchers to assess the risks of the AI model before and after its implementation. While tech companies might recognize it external control and research are needed, there are different schools of thought on exactly how much access outsiders need to get the job done.