DeepMind's AGI Safety Report: Predicting Risks and Mitigation Strategies
Google DeepMind recently released an extensive paper outlining its approach to the safety of Artificial General Intelligence (AGI), defined as AI capable of performing any task a human can. This topic is controversial, with some dismissing AGI as a distant dream, while others, like Anthropic, warn of its imminent arrival and potential for severe harm.
DeepMind's 145-page document, co-authored by Shane Legg, estimates that AGI could be a reality by 2030, potentially leading to "severe harm," including "existential risks" that could "permanently destroy humanity." The paper defines an "Exceptional AGI" as a system matching the capabilities of the top 1% of skilled adults in various tasks, including learning new skills.
DeepMind's Stance on AGI Safety
The paper contrasts DeepMind's AGI risk mitigation approach with those of Anthropic and OpenAI. DeepMind suggests that Anthropic may not emphasize "robust training, monitoring, and security" enough, while OpenAI is overly optimistic about "automating" AI safety research. DeepMind also expresses skepticism about superintelligent AI, questioning its imminent arrival without "significant architectural innovation." However, the paper acknowledges the plausibility of "recursive AI improvement," where AI enhances itself, potentially leading to dangerous outcomes.
DeepMind proposes techniques to prevent unauthorized access to AGI, enhance understanding of AI actions, and secure AI environments. While acknowledging the nascent stage of these techniques, the paper emphasizes the importance of addressing these safety challenges proactively. "The transformative nature of AGI has the potential for both incredible benefits as well as severe harms," the authors state, urging AI developers to prioritize harm mitigation.
Expert Opinions and Counterarguments
Not all experts agree with DeepMind's assessment. Heidy Khlaaf from the AI Now Institute considers the concept of AGI too ambiguous for scientific evaluation. Matthew Guzdial from the University of Alberta doubts the feasibility of recursive AI improvement, while Sandra Wachter from Oxford highlights the danger of AI reinforcing itself with inaccurate outputs, particularly with the proliferation of generative AI.
Wachter argues that chatbots, used for search and truth-finding, risk feeding users "mistruths" presented convincingly. Despite its comprehensiveness, DeepMind's paper is unlikely to resolve the ongoing debates surrounding AGI's realism and the most pressing areas of AI safety.
Source: TechCrunch