Dangerous capabilities evaluations for AI

2023-12-02

"The world if we measure AI catastrophic risk", followed by a picture of a futuristic city. — The title slide of my talk.

In the last couple of weeks, I've given two talks on the subject of dangerous capabilities evaluations for AI, one to Toronto AI Safety and the other to the Wisconsin AI Safety Initiative.

In both talks, I discussed dangerous capabilities evaluations: tests for AI systems that check whether they can make it easier for individuals or small groups to develop biological, chemical, or nuclear weapons; make copies of themselves and gather resources; or manipulate humans into helping them. I made a case for the importance of these evaulations, covered progress on them in the past year (mostly ARC Evals' work), and pointed to future research directions that I'm excited about.

You can find the presentation slides here. I've included speaker notes with more details and links to resources.