Future of Life Institute Podcast podcast show image

Future of Life Institute Podcast

Future of Life Institute

Podcast

Episodes

Listen, download, subscribe

Why AI Evaluation Science Can't Keep Up (with Carina Prunkl)

Carina Prunkl is a researcher at Inria. She joins the podcast to discuss how to assess the capabilities and risks of general-purpose AI. We examine why systems can solve hard coding and math problems yet still fail at simple tasks, why pre-deployment tests often miss real-world behavior, and how faster capability gains can increase misuse risks. The conversation also covers de-skilling, red teaming, layered safeguards, and warning signs that AIs might undermine oversight. LINKS: Carina Prunkl personal website CHAPTERS: (00:00) Episode Preview (01:04) Introducing the report (02:10) Jagged frontier capabilities (05:29) Formal reasoning progress (12:36) Risks and evaluation science (19:00) Funding evaluation capacity (24:03) Autonomy and de-skilling (31:32) Authenticity and AI companions (41:00) Defense in depth methods (48:34) Loss of control risks (53:16) Where to read report PRODUCED BY: https://aipodcast.ing SOCIAL LINKS: Website: https://podcast.futureoflife.org Twitter (FLI): https://x.com/FLI_org Twitter (Gus): https://x.com/gusdocker LinkedIn: https://www.linkedin.com/company/future-of-life-institute/ YouTube: https://www.youtube.com/channel/UC-rCCy3FQ-GItDimSR9lhzw/ Apple: https://geo.itunes.apple.com/us/podcast/id1170991978 Spotify: https://open.spotify.com/show/2Op1WO3gwVwCrYHg4eoGyP

Future of Life Institute Podcast RSS Feed


Share: TwitterFacebook

Powered by Plink Plink icon plinkhq.com