thomasbroadley.com

Dangerous capabilities evaluations for AI

In the last couple of weeks, I’ve given two talks on the subject of dangerous capabilities evaluations for AI, one to Toronto AI Safety and the other to the Wisconsin AI Safety Initiative.

In both talks, I discussed dangerous capabilities evaluations: tests for AI systems that check whether they can make it easier for individuals or small groups to develop biological, chemical, or nuclear weapons; make copies of themselves and gather resources; or manipulate humans into helping them. I made a case for the importance of these evaulations, covered progress on them in the past year (mostly ARC Evals’ work), and pointed to future research directions that I’m excited about.

You can find the presentation slides here. I’ve included speaker notes with more details and links to resources.