论文信息 - An Overview of Catastrophic AI Risks

An Overview of Catastrophic AI Risks

Rapid advancements in artificial intelligence (AI) have sparked growing concerns among experts, policymakers, and world leaders regarding the potential for increasingly advanced AI systems to pose catastrophic risks. Although numerous risks have been detailed separately, there is a pressing need for a systematic discussion and illustration of the potential dangers to better inform efforts to mitigate them. This paper provides an overview of the main sources of catastrophic AI risks, which we organize into four categories: malicious use, in which individuals or groups intentionally use AIs to cause harm; AI race, in which competitive environments compel actors to deploy unsafe AIs or cede control to AIs; organizational risks, highlighting how human factors and complex systems can increase the chances of catastrophic accidents; and rogue AIs, describing the inherent difficulty in controlling agents far more intelligent than humans. For each category of risk, we describe specific hazards, present illustrative stories, envision ideal scenarios, and propose practical suggestions for mitigating these dangers. Our goal is to foster a comprehensive understanding of these risks and inspire collective and proactive efforts to ensure that AIs are developed and deployed in a safe manner. Ultimately, we hope this will allow us to realize the benefits of this powerful technology while minimizing the potential for catastrophic outcomes.

Dan Hendrycks | Mantas Mazeika | Thomas Woodside

[1] Yohan J. John,et al. Dead rats, dopamine, performance metrics, and peacock tails: Proxy failure is an inherent risk in goal-oriented systems , 2023, Behavioral and Brain Sciences.

[2] Michael A. Specter,et al. Can large language models democratize access to dual-use biotechnology? , 2023, ArXiv.

[3] Stella Rose Biderman,et al. LEACE: Perfect linear concept erasure in closed form , 2023, ArXiv.

[4] Mingyu Derek Ma,et al. Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models , 2023, ArXiv.

[5] Emma Bluemke,et al. Towards best practices in AGI safety and governance: A survey of expert opinion , 2023, ArXiv.

[6] Ethan Perez,et al. Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting , 2023, NeurIPS.

[7] Dan Hendrycks,et al. Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark , 2023, ICML.

[8] Berkant Akkuş. Legal Transplants: Applying Arms Control Frameworks to Autonomous Weapons , 2023, Eskişehir Osmangazi Üniversitesi Sosyal Bilimler Dergisi.

[9] Dan Hendrycks. Natural Selection Favors AIs over Humans , 2023, ArXiv.

[10] Marco Tulio Ribeiro,et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.

[11] Yonadav Shavit. What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring , 2023, ArXiv.