An Overview of Catastrophic AI Risks

Rapid advancements in artificial intelligence (AI) have sparked growing concerns among experts, policymakers, and world leaders regarding the potential for increasingly advanced AI systems to pose catastrophic risks. Although numerous risks have been detailed separately, there is a pressing need for a systematic discussion and illustration of the potential dangers to better inform efforts to mitigate them. This paper provides an overview of the main sources of catastrophic AI risks, which we organize into four categories: malicious use, in which individuals or groups intentionally use AIs to cause harm; AI race, in which competitive environments compel actors to deploy unsafe AIs or cede control to AIs; organizational risks, highlighting how human factors and complex systems can increase the chances of catastrophic accidents; and rogue AIs, describing the inherent difficulty in controlling agents far more intelligent than humans. For each category of risk, we describe specific hazards, present illustrative stories, envision ideal scenarios, and propose practical suggestions for mitigating these dangers. Our goal is to foster a comprehensive understanding of these risks and inspire collective and proactive efforts to ensure that AIs are developed and deployed in a safe manner. Ultimately, we hope this will allow us to realize the benefits of this powerful technology while minimizing the potential for catastrophic outcomes.

[1]  Yohan J. John,et al.  Dead rats, dopamine, performance metrics, and peacock tails: Proxy failure is an inherent risk in goal-oriented systems , 2023, Behavioral and Brain Sciences.

[2]  Michael A. Specter,et al.  Can large language models democratize access to dual-use biotechnology? , 2023, ArXiv.

[3]  Stella Rose Biderman,et al.  LEACE: Perfect linear concept erasure in closed form , 2023, ArXiv.

[4]  Mingyu Derek Ma,et al.  Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models , 2023, ArXiv.

[5]  Emma Bluemke,et al.  Towards best practices in AGI safety and governance: A survey of expert opinion , 2023, ArXiv.

[6]  Ethan Perez,et al.  Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting , 2023, NeurIPS.

[7]  Dan Hendrycks,et al.  Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark , 2023, ICML.

[8]  Berkant Akkuş Legal Transplants: Applying Arms Control Frameworks to Autonomous Weapons , 2023, Eskişehir Osmangazi Üniversitesi Sosyal Bilimler Dergisi.

[9]  Dan Hendrycks Natural Selection Favors AIs over Humans , 2023, ArXiv.

[10]  Marco Tulio Ribeiro,et al.  Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.

[11]  Yonadav Shavit What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring , 2023, ArXiv.

[12]  Matthew Burtell,et al.  Artificial Influence: An Analysis Of AI-Driven Persuasion , 2023, ArXiv.

[13]  D. Klein,et al.  Discovering Latent Knowledge in Language Models Without Supervision , 2022, ICLR.

[14]  Alexander H. Miller,et al.  Human-level play in the game of Diplomacy by combining language models with strategic reasoning , 2022, Science.

[15]  J. Steinhardt,et al.  Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small , 2022, ArXiv.

[16]  Tom B. Brown,et al.  In-context Learning and Induction Heads , 2022, ArXiv.

[17]  Dan Hendrycks,et al.  X-Risk Analysis for AI Research , 2022, ArXiv.

[18]  Hugo K. S. Lam,et al.  The impact of chief risk officer appointments on firm risk and operational efficiency , 2022, Journal of Operations Management.

[19]  S. Ekins,et al.  Dual use of artificial-intelligence-powered drug discovery , 2022, Nature Machine Intelligence.

[20]  David Bau,et al.  Locating and Editing Factual Associations in GPT , 2022, NeurIPS.

[21]  J. Steinhardt,et al.  The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models , 2022, ICLR.

[22]  Nicholas Carlini,et al.  Unsolved Problems in ML Safety , 2021, ArXiv.

[23]  The Nucleic Acid Observatory Consortium A Global Nucleic Acid Observatory for Biodefense and Planetary Health , 2021, 2108.02678.

[24]  J. Sanz,et al.  Prevalence of Psychopathy in the General Adult Population: A Systematic Review and Meta-Analysis , 2021, Frontiers in Psychology.

[25]  Dylan Hadfield-Menell,et al.  What are you optimizing for? Aligning Recommender Systems with Human Values , 2021, ArXiv.

[26]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[27]  Yuval Elovici,et al.  The Threat of Offensive AI to Organizations , 2021, Comput. Secur..

[28]  S. Pika,et al.  Intercommunity interactions and killings in central chimpanzees (Pan troglodytes troglodytes) from Loango National Park, Gabon , 2021, Primates.

[29]  Jonathan Stray,et al.  Aligning AI Optimization to Community Well-Being , 2020, International Journal of Community Well-Being.

[30]  Zheng Zhang,et al.  Trojaning Language Models for Fun and Profit , 2020, 2021 IEEE European Symposium on Security and Privacy (EuroS&P).

[31]  Siva Reddy,et al.  StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.

[32]  Andrew Chadwick,et al.  Deepfakes and Disinformation: Exploring the Impact of Synthetic Political Video on Deception, Uncertainty, and Trust in News , 2020, Social Media + Society.

[33]  Safety Culture , 2019, Automotive System Safety.

[34]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[35]  Stuart Russell Human Compatible: Artificial Intelligence and the Problem of Control , 2019 .

[36]  Tom B. Brown,et al.  Fine-Tuning Language Models from Human Preferences , 2019, ArXiv.

[37]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[38]  Anat Lior AI Entities as AI Agents: Artificial Intelligence Liability and the AI Respondeat Superior Analogy , 2019 .

[39]  Alec Radford,et al.  Release Strategies and the Social Impacts of Language Models , 2019, ArXiv.

[40]  A. Grealish,et al.  A systematic review: the influence of social media on depression, anxiety and psychological distress in adolescents , 2019, International Journal of Adolescence and Youth.

[41]  S. Tolsma THE RADIUM GIRLS: The Dark Story of America's Shining Women , 2019 .

[42]  Zachary Wu,et al.  Machine learning-assisted directed protein evolution with combinatorial libraries , 2019, Proceedings of the National Academy of Sciences.

[43]  Anne Lauscher Life 3.0: being human in the age of artificial intelligence , 2019, Internet Histories.

[44]  Thomas G. Dietterich Robust artificial intelligence and robust human organizations , 2018, Frontiers of Computer Science.

[45]  Shashank V. Joshi,et al.  Army of none: autonomous weapons and the future of war , 2018, International Affairs.

[46]  Timnit Gebru,et al.  Datasheets for datasets , 2018, Commun. ACM.

[47]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[48]  J. Savulescu,et al.  The Artificial Moral Advisor. The “Ideal Observer” Meets Artificial Intelligence , 2017, Philosophy & Technology.

[49]  D. Cohen The Developing Brain , 2017 .

[50]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[51]  C. C. Emedolu A Critical Introduction to Scientific Realism , 2017 .

[52]  Filippo Menczer,et al.  Online Human-Bot Interactions: Detection, Estimation, and Characterization , 2017, ICWSM.

[53]  A. Kyle,et al.  The Flash Crash: High-Frequency Trading in an Electronic Market , 2017 .

[54]  S. Parmigiani,et al.  Infanticide in Lions: Consequences and Counterstrategies , 2016 .

[55]  Anca D. Dragan,et al.  The Off-Switch Game , 2016, IJCAI.

[56]  Roman V. Yampolskiy,et al.  Taxonomy of Pathways to Dangerous Artificial Intelligence , 2016, AAAI Workshop: AI, Ethics, and Society.

[57]  Evan G. Williams,et al.  The Possibility of an Ongoing Moral Catastrophe , 2015 .

[58]  Division on Earth Lessons Learned from the Fukushima Nuclear Accident for Improving Safety of U.S. Nuclear Plants , 2014 .

[59]  Vern Paxson,et al.  The Matter of Heartbleed , 2014, Internet Measurement Conference.

[60]  Jennifer Robertson,et al.  HUMAN RIGHTS VS. ROBOT RIGHTS: Forecasts from Japan , 2014 .

[61]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[62]  J. Jonides,et al.  Facebook Use Predicts Declines in Subjective Well-Being in Young Adults , 2013, PloS one.

[63]  We need to talk… , 2013, Veterinary Record.

[64]  Nancy G. Leveson,et al.  Engineering a Safer World: Systems Thinking Applied to Safety , 2012 .

[65]  H. Murdock,et al.  The Role of Internal Audit , 2012 .

[66]  P. D. Nagy,et al.  The dependence of viral RNA replication on co-opted host factors , 2011, Nature Reviews Microbiology.

[67]  D. Galbreath The diffusion of military power: causes and consequences for international politics , 2011 .

[68]  Anthony R Scialli,et al.  Thalidomide: the tragedy of birth defects and the effective treatment of disease. , 2011, Toxicological sciences : an official journal of the Society of Toxicology.

[69]  Donald T. Campbell,et al.  Assessing the Impact of Planned Social Change* , 2010, Journal of MultiDisciplinary Evaluation.

[70]  Robert Carlson,et al.  The changing economics of DNA synthesis , 2009, Nature Biotechnology.

[71]  Thomas Klier,et al.  From Tail Fins to Hybrids: How Detroit Lost its Dominance of the U.S. Auto Market , 2009 .

[72]  E. Broughton The Bhopal disaster and its aftermath: a review , 2005, Environmental health : a global access science source.

[73]  S. Maffettone,et al.  Political liberalism , 2004 .

[74]  Timothy Schroeder,et al.  Three Faces of Desire , 2004 .

[75]  E. Rolls The orbitofrontal cortex and reward. , 2000, Cerebral cortex.

[76]  S. Hecht,et al.  Tobacco smoke carcinogens and lung cancer. , 1999, Journal of the National Cancer Institute.

[77]  K. B. Olson,et al.  Aum Shinrikyo: once and future threat? , 1999, Emerging infectious diseases.

[78]  J. Ladyman What is Structural Realism , 1998 .

[79]  Jens Rasmussen,et al.  Risk management in a dynamic society: a modelling problem , 1997 .

[80]  Gregor Thut,et al.  Activation of the human brain by monetary reward , 1997, Neuroreport.

[81]  Diane Vaughan,et al.  The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA , 1996 .

[82]  M. Hugh-jones,et al.  The Sverdlovsk anthrax outbreak of 1979. , 1994, Science.

[83]  A. Seaton,et al.  Asbestos: scientific developments and implications for public policy. , 1990, Science.

[84]  J. E. Groves,et al.  Made in America: Science, Technology and American Modernist Poets , 1989 .

[85]  R. Jervis Cooperation under the Security Dilemma , 1978, World Politics.

[86]  M. Molina,et al.  Stratospheric sink for chlorofluoromethanes: chlorine atomc-atalysed destruction of ozone , 1974, Nature.

[87]  Michael Ellman,et al.  Planning Problems in the USSR: The Contribution of Mathematical Economics to their Solution 1960-1971 , 1973 .

[88]  D. Manheim Building a Culture of Safety for AI: Perspectives and Challenges , 2023, SSRN Electronic Journal.

[89]  D. Sportiello The Precipice: Existential Risk and the Future of Humanity. By Toby Ord , 2023, American Catholic Philosophical Quarterly.

[90]  Toby Shevlane,et al.  Structured access to AI capabilities: an emerging paradigm for safe AI deployment , 2022, ArXiv.

[91]  S. Levine,et al.  Adversarial Policies Beat Professional-Level Go AIs , 2022, ArXiv.

[92]  Toby Ord,et al.  The Parliamentary Approach to Moral Uncertainty , 2021 .

[93]  Kelli Mars 35 Years Ago: Remembering Challenger and Her Crew , 2021 .

[94]  Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , 2021 .

[95]  D. Roodman On the probability distribution of long-term changes in the growth rate of the global economy: An outside view , 2020 .

[96]  Laura Schweitzer,et al.  The Making Of The Atomic Bomb , 2016 .

[97]  N. Beckstead On the overwhelming importance of shaping the far future , 2013 .

[98]  A. Buschinger Social parasitism among ants: a review (Hymenoptera: Formicidae) , 2009 .

[99]  S. Trevisanato The 'Hittite plague', an epidemic of tularemia and the first record of biological warfare. , 2007, Medical hypotheses.

[100]  J. Schneider,et al.  Lead neurotoxicity in children: basic mechanisms and clinical correlates. , 2003, Brain : a journal of neurology.

[101]  J MaynardSmith,et al.  The units of selection. , 1998, Novartis Foundation symposium.

[102]  C. Sagan Pale blue dot : a vision of the human future in space , 1994 .

[103]  T. Laporte,et al.  Working in Practice But Not in Theory: Theoretical Challenges of “High-Reliability Organizations” , 1991 .

[104]  F. R. Frola,et al.  System Safety in Aircraft Acquisition , 1984 .

[105]  Lee Patrick Strobel,et al.  Reckless Homicide? Ford's Pinto Trial , 1980 .

[106]  Mitchell Rogovin,et al.  Three Mile Island : a report to the Commissioners and to the public , 1980 .

[107]  J. Mueller,et al.  War, presidents, and public opinion , 1973 .