Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries

Companies like OpenAI, Google DeepMind, and Anthropic have the stated goal of building artificial general intelligence (AGI) - AI systems that perform as well as or better than humans on a wide variety of cognitive tasks. However, there are increasing concerns that AGI would pose catastrophic risks. In light of this, AGI companies need to drastically improve their risk management practices. To support such efforts, this paper reviews popular risk assessment techniques from other safety-critical industries and suggests ways in which AGI companies could use them to assess catastrophic risks from AI. The paper discusses three risk identification techniques (scenario analysis, fishbone method, and risk typologies and taxonomies), five risk analysis techniques (causal mapping, Delphi technique, cross-impact analysis, bow tie analysis, and system-theoretic process analysis), and two risk evaluation techniques (checklists and risk matrices). For each of them, the paper explains how they work, suggests ways in which AGI companies could use them, discusses their benefits and limitations, and makes recommendations. Finally, the paper discusses when to conduct risk assessments, when to use which technique, and how to use any of them. The reviewed techniques will be obvious to risk management professionals in other industries. And they will not be sufficient to assess catastrophic risks from AI. However, AGI companies should not skip the straightforward step of reviewing best practices from other industries.

[1]  Gillian K. Hadfield,et al.  Frontier AI Regulation: Managing Emerging Risks to Public Safety , 2023, ArXiv.

[2]  An Overview of Catastrophic AI Risks , 2023, ArXiv.

[3]  Andrew Critch,et al.  TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI , 2023, ArXiv.

[4]  Nancy J. Cooke,et al.  Managing the risks of artificial general intelligence: A human factors and ergonomics perspective , 2023, Human Factors and Ergonomics in Manufacturing & Service Industries.

[5]  Jonas Schuett AGI labs need an internal audit function , 2023, ArXiv.

[6]  Sebastian Farquhar,et al.  Model evaluation for extreme risks , 2023, ArXiv.

[7]  M. Choudhury,et al.  Tricking LLMs into Disobedience: Understanding, Analyzing, and Preventing Jailbreaks , 2023, ArXiv.

[8]  Emma Bluemke,et al.  Towards best practices in AGI safety and governance: A survey of expert opinion , 2023, ArXiv.

[9]  Julian Hazell Large Language Models Can Be Used To Effectively Scale Spear Phishing Campaigns , 2023, ArXiv.

[10]  Chunyuan Li,et al.  Instruction Tuning with GPT-4 , 2023, ArXiv.

[11]  Hannah Rose Kirk,et al.  Assessing Language Model Deployment with Risk Cards , 2023, ArXiv.

[12]  Markus Anderljung,et al.  Protecting Society from AI Misuse: When are Restrictions on Capabilities Warranted? , 2023, ArXiv.

[13]  Henrique Pondé de Oliveira Pinto,et al.  GPT-4 Technical Report , 2023, 2303.08774.

[14]  A. Stolzer,et al.  Safety Management Systems in Aviation , 2023 .

[15]  Dmitrii Krasheninnikov,et al.  Harms from Increasingly Agentic Algorithmic Systems , 2023, FAccT.

[16]  Yue Liu,et al.  Towards Concrete and Connected AI Risk Assessment (C2AIRA): A Systematic Mapping Study , 2023, 2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN).

[17]  Chris Ventura,et al.  Examining the Differential Risk from High-level Artificial Intelligence and the Question of Control , 2022, Futures.

[18]  Negar Rostamzadeh,et al.  Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction , 2022, AIES.

[19]  Richard Ngo The alignment problem from a deep learning perspective , 2022, ArXiv.

[20]  Michael K Cohen,et al.  Advanced Artificial Agents Intervene in the Provision of Reward , 2022, AI Mag..

[21]  Richard Yuanzhe Pang,et al.  What Do NLP Researchers Believe? Results of the NLP Community Metasurvey , 2022, ACL.

[22]  Tom B. Brown,et al.  Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned , 2022, ArXiv.

[23]  Shiri Dori-Hacohen,et al.  Current and Near-Term AI as a Potential Existential Risk Factor , 2022, AIES.

[24]  Joshua Achiam,et al.  A Hazard Analysis Framework for Code Synthesis Large Language Models , 2022, ArXiv.

[25]  Jess Whittlestone,et al.  A Survey of the Potential Long-term Impacts of AI: How AI Could Lead to Long-term Changes in Science, Cooperation, Power, Epistemics and Values , 2022, AIES.

[26]  Inioluwa Deborah Raji,et al.  The Fallacy of AI Functionality , 2022, FAccT.

[27]  Lisa Anne Hendricks,et al.  Taxonomy of Risks posed by Language Models , 2022, FAccT.

[28]  Dan Hendrycks,et al.  Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks , 2022, ArXiv.

[29]  Joseph Carlsmith Is Power-Seeking AI an Existential Risk? , 2022, ArXiv.

[30]  Dan Hendrycks,et al.  X-Risk Analysis for AI Research , 2022, ArXiv.

[31]  Michael C. Horowitz,et al.  Forecasting AI Progress: Evidence from a Survey of Machine Learning Researchers , 2022, ArXiv.

[32]  Jamy J. Li,et al.  FMEA-AI: AI fairness impact assessment using failure mode and effects analysis , 2022, AI and Ethics.

[33]  S. Ekins,et al.  Dual use of artificial-intelligence-powered drug discovery , 2022, Nature Machine Intelligence.

[34]  Roel Dobbe System Safety and Artificial Intelligence , 2022, FAccT.

[35]  Geoffrey Irving,et al.  Red Teaming Language Models with Language Models , 2022, EMNLP.

[36]  S. Kauffman,et al.  How Organisms Come to Know the World: Fundamental Limits on Artificial General Intelligence , 2021, Frontiers in Ecology and Evolution.

[37]  Haoran Sun,et al.  Towards artificial general intelligence via a multimodal foundation model , 2021, Nature Communications.

[38]  Ben Buchanan,et al.  Truth, Lies, and Automation: How Language Models Could Change Disinformation , 2021 .

[39]  Jess Whittlestone,et al.  Artificial Canaries: Early Warning Signs for Anticipatory and Democratic Governance of AI , 2021, Int. J. Interact. Multim. Artif. Intell..

[40]  Emily M. Bender,et al.  On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.

[41]  S. Baum Quantifying the probability of existential catastrophe: A reply to Beard et al. , 2020, Futures.

[42]  S. Beard,et al.  Existential risk assessment: A reply to Baum , 2020 .

[43]  B. Bolwell Good Judgment , 2020, Oncology Times.

[44]  Cfp,et al.  What can we learn from COVID-19? , 2020 .

[45]  Andrew Critch,et al.  AI Research Considerations for Human Existential Safety (ARCHES) , 2020, ArXiv.

[46]  Peter Henderson,et al.  Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims , 2020, ArXiv.

[47]  Owen Cotton-Barratt,et al.  Defence in Depth Against Human Extinction: Prevention, Response, Resilience, and Why They All Matter , 2020, Global policy.

[48]  Vanessa J. Schweizer,et al.  Reflections on cross-impact balances, a systematic method constructing global socio-technical scenarios for climate change research , 2020, Climatic Change.

[49]  Inioluwa Deborah Raji,et al.  Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing , 2020, FAT*.

[50]  Stuart Russell Human Compatible: Artificial Intelligence and the Problem of Control , 2019 .

[51]  Mingguo Zhao,et al.  Towards artificial general intelligence with hybrid Tianjic chip architecture , 2019, Nature.

[52]  J. Guttag,et al.  A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle , 2019, EAAMO.

[53]  Inioluwa Deborah Raji,et al.  Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products , 2019, AIES.

[54]  David B. Paradice,et al.  Forecasting Transformative AI: An Expert Survey , 2019, ArXiv.

[55]  Emily M. Bender,et al.  Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.

[56]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[57]  M. Maas,et al.  Governing Boring Apocalypses: A new typology of existential vulnerabilities and exposures for existential risk research , 2018, Futures.

[58]  Seán Ó hÉigeartaigh,et al.  Classifying global catastrophic risks , 2018, Futures.

[59]  R. MacDonell-Yilmaz At the precipice , 2018, Pediatric blood & cancer.

[60]  Timnit Gebru,et al.  Datasheets for datasets , 2018, Commun. ACM.

[61]  Hyrum S. Anderson,et al.  The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation , 2018, ArXiv.

[62]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[63]  Anthony Michael Barrett,et al.  Value of Global Catastrophic Risk (GCR) Information: Cost-Effectiveness-Based Approach for GCR Reduction , 2017, Decis. Anal..

[64]  John Salvatier,et al.  When Will AI Exceed Human Performance? Evidence from AI Experts , 2017, ArXiv.

[65]  C. Robert Superintelligence: Paths, Dangers, Strategies , 2017 .

[66]  Doug Miller,et al.  Intelligent, automated red team emulation , 2016, ACSAC.

[67]  J. Lawler,et al.  Viral agents of human disease: biosafety concerns. , 2016 .

[68]  Anthony Michael Barrett,et al.  A model of pathways to artificial superintelligence catastrophe for risk and decision analysis , 2016, J. Exp. Theor. Artif. Intell..

[69]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[70]  Sangsung Park,et al.  A Hybrid Method of Analyzing Patents for Sustainable Technology Management in Humanoid Robot Industry , 2016 .

[71]  Oliver Zendel,et al.  CV-HAZOP: Introducing Test Data Validation for Computer Vision , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[72]  Roman V. Yampolskiy,et al.  Taxonomy of Pathways to Dangerous AI , 2015, ArXiv.

[73]  Seth D. Baum,et al.  Risk Analysis and Risk Management for the Artificial Superintelligence Research and Development Process , 2015 .

[74]  John Quigley,et al.  Systemic risk elicitation: Using causal maps to engage stakeholders and build a comprehensive view of risks , 2014, Eur. J. Oper. Res..

[75]  Stuart Armstrong,et al.  The errors, insights and lessons of famous AI predictions – and what they mean for the future , 2014, J. Exp. Theor. Artif. Intell..

[76]  M. G. Morgan Use (and abuse) of expert elicitation in support of decision making for public policy , 2014, Proceedings of the National Academy of Sciences.

[77]  Faisal Aqlan,et al.  Integrating lean principles and fuzzy bow-tie analysis for risk assessment in chemical industry , 2014 .

[78]  Lacey Colligan,et al.  Assessing the validity of prospective hazard analysis methods: a comparison of two techniques , 2014, BMC Health Services Research.

[79]  Michael G. Mitchell Taxonomy , 2013, Viruses and the Lung.

[80]  Nick Bostrom,et al.  Existential Risk Prevention as Global Priority , 2013 .

[81]  Lee T. Ostrom,et al.  Risk Assessment: Tools, Techniques, and Their Applications , 2012 .

[82]  Rick Parente,et al.  A case study of long-term Delphi accuracy , 2011 .

[83]  Marvin Rausand,et al.  Risk Assessment: Theory, Methods, and Applications , 2011 .

[84]  Thomas J. Chermack,et al.  Scenario Planning in Organizations: How to Create, Use, and Assess Scenarios , 2011 .

[85]  Andrew Lakoff,et al.  Are we Prepared for the Next Disaster? , 2007 .

[86]  J. Bryson,et al.  Visible Thinking: Unlocking Causal Mapping for Practical Business Results , 2004 .

[87]  Carl L. Pritchard,et al.  Risk Management: Concepts and Guidance , 2001 .

[88]  J. Reason Human error: models and management , 2000, BMJ : British Medical Journal.

[89]  George Wright,et al.  The Delphi technique as a forecasting tool: issues and analysis , 1999 .

[90]  J. Crisp,et al.  The Delphi method? , 1997, Nursing research.

[91]  R. Fildes Scenarios: The Art of Strategic Conversation , 1996, J. Oper. Res. Soc..

[92]  R. Schifter White House , 1996 .

[93]  Alan L. Porter,et al.  Cross-impact analysis , 1990 .

[94]  John R. Searle,et al.  Minds, brains, and programs , 1980, Behavioral and Brain Sciences.

[95]  Jeffrey L. Johnson,et al.  A ten-year Delphi forecast in the electronics industry , 1976 .

[96]  R. Korotev Method , 1966, Understanding Religion.

[97]  M. R. Leadbetter,et al.  Hazard Analysis , 2018, System Safety Engineering and Risk Assessment.

[98]  Our Principles , 1913, Texas medical journal.

[99]  D. Manheim Building a Culture of Safety for AI: Perspectives and Challenges , 2023, SSRN Electronic Journal.

[100]  James Fox,et al.  An analysis and evaluation of methods currently used to quantify the likelihood of existential hazards , 2020 .

[101]  Jade Leung Who will govern artificial intelligence? : learning from the history of strategic politics in emerging technologies , 2019 .

[102]  M. Westerlund The Emergence of Deepfake Technology: A Review , 2019, Technology Innovation Management Review.

[103]  Roland Müller,et al.  Fundamentals and Structure of Safety Management Systems in Aviation , 2014 .

[104]  Nick Bostrom,et al.  Future Progress in Artificial Intelligence: A Survey of Expert Opinion , 2013, PT-AI.

[105]  R. Penrose,et al.  How Long Until Human-Level AI ? Results from an Expert Assessment , 2011 .

[106]  Tom Ritchey,et al.  Modelling Society ’ s Capacity to Manage Extraordinary Events Developing a Generic Design Basis ( GDB ) Model for Extraordinary Societal Events using Computer-Aided Morphological , 2011 .

[107]  Ben Goertzel,et al.  How long until human-level AI? Results from an expert assessment , 2011 .

[108]  N. Bostrom,et al.  Global Catastrophic Risks , 2008 .

[109]  Hannah Kosow,et al.  Methods of Future and Scenario Analysis: Overview, Assessment, and Selection Criteria , 2008 .

[110]  Eliezer Yudkowsky Artificial Intelligence as a Positive and Negative Factor in Global Risk , 2006 .

[111]  知秋 Microsoft:微软“变脸” , 2006 .

[112]  Martin Davies,et al.  Safety First - Scenario Analysis under Basel II , 2006 .

[113]  Steve Lewis,et al.  Lessons Learned from Real World Application of the Bow-tie Method , 2005 .

[114]  Richard A. Posner,et al.  Catastrophe: Risk and Response , 2004 .

[115]  Martin J. Rees,et al.  Our final hour : a scientist's warning : how terror, error, and environmental disaster threaten humankind's future in this century-- on earth and beyond , 2003 .

[116]  N. Bostrom Existential risks: analyzing human extinction scenarios and related hazards , 2002 .

[117]  U. Epa Guidelines for ecological risk assessment , 1998 .

[118]  Theodore Jay Gordon,et al.  CROSS-IMPACT METHOD , 1994 .

[119]  Richard Wilson Risk analysis , 1986, Nature.

[120]  J. Voelkel Guide to Quality Control , 1982 .

[121]  Heidy Khlaaf Toward Comprehensive Risk Assessments and Assurance of AI-Based Systems , 2022 .