Future of Life Institute Podcast
Future of Life Institute
Podcast
Episodes
Listen, download, subscribe
Why AI Evaluation Science Can't Keep Up (with Carina Prunkl)
Carina Prunkl is a researcher at Inria. She joins the podcast to discuss how to assess the capabilities and risks of general-purpose AI. We examine why systems can solve hard coding and math problems yet still fail at simple tasks, why pre-deployment tests often miss real-world behavior, and how faster capability gains can increase misuse risks. The conversation also covers de-skilling, red teaming, layered safeguards, and warning signs that AIs might undermine oversight. LINKS: Carina Prunkl personal website CHAPTERS: (00:00) Episode Preview (01:04) Introducing the report (02:10) Jagged frontier capabilities (05:29) Formal reasoning progress (12:36) Risks and evaluation science (19:00) Funding evaluation capacity (24:03) Autonomy and de-skilling (31:32) Authenticity and AI companions (41:00) Defense in depth methods (48:34) Loss of control risks (53:16) Where to read report PRODUCED BY: https://aipodcast.ing SOCIAL LINKS: Website: https://podcast.futureoflife.org Twitter (FLI): https://x.com/FLI_org Twitter (Gus): https://x.com/gusdocker LinkedIn: https://www.linkedin.com/company/future-of-life-institute/ YouTube: https://www.youtube.com/channel/UC-rCCy3FQ-GItDimSR9lhzw/ Apple: https://geo.itunes.apple.com/us/podcast/id1170991978 Spotify: https://open.spotify.com/show/2Op1WO3gwVwCrYHg4eoGyP
Future of Life Institute Podcast RSS Feed
