On Controllability of AI

Invention of artificial general intelligence is predicted to cause a shift in the trajectory of human civilization. In order to reap the benefits and avoid pitfalls of such powerful technology it is important to be able to control it. However, possibility of controlling artificial general intelligence and its more advanced version, superintelligence, has not been formally established. In this paper, we present arguments as well as supporting evidence from multiple domains indicating that advanced AI can't be fully controlled. Consequences of uncontrollability of AI are discussed with respect to future of humanity and research on AI, and AI safety and security.

[1]  Roman V. Yampolskiy,et al.  Classification Schemas for Artificial Intelligence Failures , 2019, Delphi - Interdisciplinary Review of Emerging Technologies.

[2]  Christopher D. Wickens,et al.  A model for types and levels of human interaction with automation , 2000, IEEE Trans. Syst. Man Cybern. Part A.

[3]  Roger Clarke,et al.  Why the world wants controls over Artificial Intelligence , 2019, Comput. Law Secur. Rev..

[4]  M. Little Virtue as Knowledge: Objections from the Philosophy of Mind , 1997 .

[5]  Michael Fisher,et al.  Towards Moral Autonomous Systems , 2017, ArXiv.

[6]  Shane Legg,et al.  Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[7]  Max H. Bazerman,et al.  The Impossibility of Auditor Independence , 1997 .

[8]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[9]  Leon Kester,et al.  Transformative AI Governance and AI-Empowered Ethical Enhancement Through Preemptive Simulations , 2019 .

[10]  Arthur Charlesworth Comprehending software correctness implies comprehending an intelligence-related limitation , 2006, TOCL.

[11]  D. Kahneman Thinking, Fast and Slow , 2011 .

[12]  Touchette,et al.  Information-theoretic limits of control , 1999, Physical review letters.

[13]  Roman V. Yampolskiy,et al.  Efficiency Theory : a Unifying Theory for Information, Computation and Intelligence , 2011, Journal of Discrete Mathematical Sciences and Cryptography.

[14]  Mark A. Neerincx,et al.  Telling autonomous systems what to do , 2018, ECCE.

[15]  Telmo Menezes,et al.  Non-Evolutionary Superintelligences Do Nothing, Eventually , 2016, ArXiv.

[16]  Robert K. Cunningham,et al.  Why Measuring Security Is Hard , 2010, IEEE Security & Privacy.

[17]  How to Apply the Ethical Regulator Theorem to Crises , 2020 .

[18]  Cormac Herley,et al.  Unfalsifiability of security claims , 2016, Proceedings of the National Academy of Sciences.

[19]  Roman V Yampolskiy,et al.  Safety Engineering for Artificial General Intelligence , 2012, Topoi.

[20]  Nancy A. Lynch,et al.  The impossibility of implementing reliable communication in the face of crashes , 1993, JACM.

[21]  C. List,et al.  Aggregating Sets of Judgments: An Impossibility Result , 2002, Economics and Philosophy.

[22]  Stuart Armstrong,et al.  Impossibility of deducing preferences and rationality from human policy , 2017, NIPS 2018.

[23]  Kristinn R. Thórisson,et al.  Safe Baby AGI , 2015, AGI.

[24]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[25]  J. Klamka Uncontrollability and unobservability of composite systems , 1973 .

[26]  Kurt Gödel,et al.  On Formally Undecidable Propositions of Principia Mathematica and Related Systems , 1966 .

[27]  Andrew Critch,et al.  AI Research Considerations for Human Existential Safety (ARCHES) , 2020, ArXiv.

[28]  Craig Gentry,et al.  Toward Basing Fully Homomorphic Encryption on Worst-Case Hardness , 2010, CRYPTO.

[29]  Matthias Scheutz,et al.  The “big red button” is too late: an alternative model for the ethical evaluation of AI systems , 2018, Ethics and Information Technology.

[30]  A. Turing On Computable Numbers, with an Application to the Entscheidungsproblem. , 1937 .

[31]  Roman Yampolskiy,et al.  Orthogonality-Based Disentanglement of Responsibilities for Ethical Intelligent Systems , 2019, AGI.

[32]  Oren Etzioni,et al.  From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project , 2019, AI Mag..

[33]  Arslan Munir,et al.  A Psychopathological Approach to Safety Engineering in AI and AGI , 2018, SAFECOMP Workshops.

[34]  David H. Wolpert Constraints on physical reality arising from a formalization of knowledge , 2017, ArXiv.

[35]  D. Parfit Reasons and Persons , 1986 .

[36]  Peter Eckersley Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function) , 2019, SafeAI@AAAI.

[37]  Nancy A. Lynch,et al.  A hundred impossibility proofs for distributed computing , 1989, PODC '89.

[38]  Nick Bostrom,et al.  Superintelligence: Paths, Dangers, Strategies , 2014 .

[39]  Stuart Armstrong,et al.  Occam's razor is insufficient to infer the preferences of irrational agents , 2017, NeurIPS.

[40]  Roger Clarke,et al.  Asimov's Laws of Robotics: Implications for Information Technology - Part 2 , 1993, Computer.

[41]  Roman V. Yampolskiy,et al.  Unexplainability and Incomprehensibility of Artificial Intelligence , 2019, ArXiv.

[42]  Marcus Hutter,et al.  A Game-Theoretic Analysis of the Off-Switch Game , 2017, AGI.

[43]  Roman V Yampolskiy Predicting future AI failures from historic examples , 2019, foresight.

[44]  Seán S. ÓhÉigeartaigh,et al.  Bridging near- and long-term concerns about AI , 2019, Nat. Mach. Intell..

[45]  E. Koonin,et al.  The sounds of science—a symphony for many instruments and voices , 2019, Physica Scripta.

[46]  Thomas Miconi,et al.  The impossibility of "fairness": a generalized impossibility result for decisions , 2017, 1707.01195.

[47]  Uncontrollability of composite systems , 1974 .

[48]  András Kornai,et al.  Bounding the impact of AGI , 2014, J. Exp. Theor. Artif. Intell..

[49]  Roman Yampolskiy,et al.  The Formalization of AI Risk Management and Safety Standards , 2017, AAAI Workshops.

[50]  Raymond C. Kurzweil,et al.  The Singularity Is Near , 2018, The Infinite Desire for Growth.

[51]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[52]  Roman Yampolskiy,et al.  An AGI with Time-Inconsistent Preferences , 2019, Delphi - Interdisciplinary Review of Emerging Technologies.

[53]  Nick Bostrom,et al.  Thinking Inside the Box: Controlling and Using an Oracle AI , 2012, Minds and Machines.

[54]  Roman V. Yampolskiy,et al.  Personal Universes: A Solution to the Multi-Agent Value Alignment Problem , 2019, ArXiv.

[55]  Jean-Marie Dufour,et al.  Some Impossibility Theorems in Econometrics with Applications to Structural and Dynamic Models , 1997 .

[56]  Colin Allen,et al.  Prolegomena to any future artificial moral agent , 2000, J. Exp. Theor. Artif. Intell..

[57]  Robert P. Parks,et al.  An Impossibility Theorem for Fixed Preferences: A Dictatorial Bergson-Samuelson Welfare Function , 1976 .

[58]  M. Wadman US biologists adopt cloning moratorium , 1997, Nature.

[59]  Sanjit A. Seshia,et al.  Towards Verified Artificial Intelligence , 2016, ArXiv.

[60]  Roman V. Yampolskiy,et al.  Utility function security in artificially intelligent agents , 2014, J. Exp. Theor. Artif. Intell..

[61]  Arslan Munir,et al.  Emergence of Addictive Behaviors in Reinforcement Learning Agents , 2019, SafeAI@AAAI.

[62]  Ronald C. Arkin,et al.  Governing Lethal Behavior in Autonomous Robots , 2009 .

[63]  J. Sprenger Two Impossibility Results for Measures of Corroboration , 2016, The British Journal for the Philosophy of Science.

[64]  Roman V. Yampolskiy,et al.  Turing Test as a Defining Feature of AI-Completeness , 2013, Artificial Intelligence, Evolutionary Computing and Metaheuristics.

[65]  Miriam Buiten,et al.  Towards Intelligent Regulation of Artificial Intelligence , 2019, European Journal of Risk Regulation.

[66]  Roman V. Yampolskiy,et al.  AI-Complete CAPTCHAs as Zero Knowledge Proofs of Access to an Artificially Intelligent System , 2012 .

[67]  Anca D. Dragan,et al.  The Off-Switch Game , 2016, IJCAI.

[68]  Lev Reyzin Unprovability comes to machine learning , 2019, Nature.

[69]  W. Ashby,et al.  An Introduction to Cybernetics , 1957 .

[70]  Hyrum S. Anderson,et al.  The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation , 2018, ArXiv.

[71]  Thomas B. Sheridan,et al.  Human and Computer Control of Undersea Teleoperators , 1978 .

[72]  Richard A. Carrigan Do potential SETI signals need to be decontaminated , 2006 .

[73]  W. Ashby,et al.  Every Good Regulator of a System Must Be a Model of That System , 1970 .

[74]  Iyad Rahwan,et al.  Superintelligence cannot be contained: Lessons from Computability Theory , 2016, J. Artif. Intell. Res..

[75]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[76]  Marcus Hutter,et al.  AGI Safety Literature Review , 2018, IJCAI.

[77]  Miles Brundage,et al.  Limitations and risks of machine ethics , 2014, J. Exp. Theor. Artif. Intell..

[78]  Sean McKeever,et al.  The Many Moral Particularisms , 2005, Canadian Journal of Philosophy.

[79]  A. Sen,et al.  The Impossibility of a Paretian Liberal , 1970, Journal of Political Economy.

[80]  Eliezer Yudkowsky Cognitive biases potentially affecting judgement of global risks , 2008 .

[81]  Leonid A. Levin,et al.  Average Case Complete Problems , 1986, SIAM J. Comput..

[82]  José Hernández-Orallo,et al.  A Formal Definition of Intelligence Based on an Intensional Variant of Algorithmic Complexity , 2003 .

[83]  E. Musk An Integrated Brain-Machine Interface Platform With Thousands of Channels , 2019, bioRxiv.

[84]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[85]  Eliezer Yudkowsky Artificial Intelligence as a Positive and Negative Factor in Global Risk , 2006 .

[86]  Mick Ashby Ethical Regulators and Super-Ethical Systems , 2019 .

[87]  David J. Jilk Limits to Verification and Validation of Agentic Behavior , 2016, Artificial Intelligence Safety and Security.

[88]  David J. Jilk,et al.  Anthropomorphic reasoning about neuromorphic AGI safety , 2017, J. Exp. Theor. Artif. Intell..

[89]  Aaron Smuts To Be or Never to Have Been: Anti-Natalism and a Life Worth Living , 2014 .

[90]  Kareem Amin,et al.  Towards Resolving Unidentifiability in Inverse Reinforcement Learning , 2016, ArXiv.

[91]  R. Kurzweil,et al.  The Singularity Is Near: When Humans Transcend Biology , 2006 .

[92]  Roman V. Yampolskiy,et al.  Artificial Stupidity: Data We Need to Make Machines Our Equals , 2020, Patterns.

[93]  G. Strawson,et al.  The impossibility of moral responsibility , 1994 .

[94]  Sara Lumbreras,et al.  The Limits of Machine Ethics , 2017 .

[95]  Iason Gabriel,et al.  Artificial Intelligence, Values, and Alignment , 2020, Minds and Machines.

[96]  J. Tarter,et al.  The Search for Extraterrestrial Intelligence (SETI) , 2001 .

[97]  Donghee Shin,et al.  Role of fairness, accountability, and transparency in algorithmic affordance , 2019, Comput. Hum. Behav..

[98]  Stuart Russell Should We Fear Supersmart Robots? , 2016, Scientific American.

[99]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[100]  Hugo de Garis What If AI Succeeds? The Rise of the Twenty-First Century Artilect , 1989, AI Mag..

[101]  David H. Wolpert,et al.  Physical limits of inference , 2007, ArXiv.

[102]  Sanford J. Grossman On the Impossibility of Informationally Efficient Markets , 1980 .

[103]  James D. Miller,et al.  An AGI Modifying Its Utility Function in Violation of the Strong Orthogonality Thesis , 2020, Philosophies.

[104]  Eleanor Nell Watson The Supermoral Singularity - AI as a Fountain of Values , 2019, Big Data Cogn. Comput..

[105]  Brian Cantwell Smith,et al.  The limits of correctness , 1985, CSOC.

[106]  Leon Kester,et al.  Requisite Variety in Ethical Utility Functions for AI Value Alignment , 2019, AISafety@IJCAI.

[107]  Marvin Minsky,et al.  Society of Mind: A Response to Four Reviews , 1991, Artif. Intell..

[108]  L. Valiant Probably Approximately Correct: Nature's Algorithms for Learning and Prospering in a Complex World , 2013 .

[109]  David H Wolpert,et al.  Computational capabilities of physical systems. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[110]  Oren Etzioni,et al.  The First Law of Robotics (A Call to Arms) , 1994, AAAI.

[111]  Suresh Venkatasubramanian,et al.  On the (im)possibility of fairness , 2016, ArXiv.

[112]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[113]  Edwin K. P. Chong,et al.  The Control Problem [President's Message] , 2017 .

[114]  L. Versenyi,et al.  Can Robots be Moral? , 1974, Ethics.

[115]  Roman V. Yampolskiy,et al.  Taxonomy of Pathways to Dangerous Artificial Intelligence , 2016, AAAI Workshop: AI, Ethics, and Society.

[116]  P. Wang,et al.  Invariance, uncontrollability, and unobservaility in dynamical systems , 1965 .

[117]  Roman V. Yampolskiy Unpredictability of AI: On the Impossibility of Accurately Predicting All Actions of a Smarter Agent , 2020 .

[118]  Norbert Wiener,et al.  Cybernetics. , 1948, Scientific American.

[119]  Kaj Sotala,et al.  Superintelligence As a Cause or Cure For Risks of Astronomical Suffering , 2017, Informatica.

[120]  Stephen M. Omohundro,et al.  The Basic AI Drives , 2008, AGI.

[121]  Kristinn R. Thórisson,et al.  Error-Correction for AI Safety , 2020, AGI.

[122]  Gahangir Hossain,et al.  Cognitive Ability-Demand Gap Analysis With Latent Response Models , 2014, IEEE Access.

[123]  Roman V. Yampolskiy,et al.  AI-Complete, AI-Hard, or AI-Easy - Classification of Problems in AI , 2012, MAICS.

[124]  Boris A. Trakhtenbrot,et al.  A Survey of Russian Approaches to Perebor (Brute-Force Searches) Algorithms , 1984, Annals of the History of Computing.

[125]  Shino Takayama,et al.  A unifying impossibility theorem , 2013 .

[126]  Urs Schweizer,et al.  Universal possibility and impossibility results , 2006, Games Econ. Behav..

[127]  Roman V. Yampolskiy,et al.  Wisdom of Artificial Crowds—A Metaheuristic Algorithm for Optimization , 2012 .

[128]  K. Popper,et al.  A proof of the impossibility of inductive probability , 1983, Nature.

[129]  James Babcock,et al.  The AGI Containment Problem , 2016 .

[130]  Kyle Bogosian,et al.  Implementation of Moral Uncertainty in Intelligent Machines , 2017, Minds and Machines.

[131]  K. Arrow A Difficulty in the Concept of Social Welfare , 1950, Journal of Political Economy.

[132]  Seth Lloyd,et al.  Information-theoretic approach to the study of control systems , 2001, physics/0104007.

[133]  Seth D. Baum Superintelligence Skepticism as a Political Tool , 2018, Inf..

[134]  Roman V. Yampolskiy On the Limits of Recursively Self-Improving AGI , 2015, AGI.

[135]  Nihar B. Shah,et al.  On the Impossibility of Convex Inference in Human Computation , 2014, AAAI.

[136]  James D. Miller,et al.  The Fermi paradox, Bayes’ rule, and existential risk management , 2017 .

[137]  M. Hippke,et al.  Interstellar communication. IX. Message decontamination is impossible , 2018, 1802.02180.

[138]  Gustaf Arrhenius,et al.  An Impossibility Theorem for Welfarist Axiologies , 2000, Economics and Philosophy.

[139]  Charles Yoe,et al.  Primer on Risk Analysis: Decision Making Under Uncertainty , 2011 .

[140]  Scott Garrabrant,et al.  Risks from Learned Optimization in Advanced Machine Learning Systems , 2019, ArXiv.

[141]  Allen S. Lee,et al.  When Humans Using the IT Artifact Becomes IT Using the Human Artifact , 2017, HICSS.

[142]  H P Young,et al.  On the impossibility of predicting the behavior of rational agents , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[143]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[144]  H. Rice Classes of recursively enumerable sets and their decision problems , 1953 .

[145]  Shane Legg,et al.  Scalable agent alignment via reward modeling: a research direction , 2018, ArXiv.

[146]  A. Roth An impossibility result concerningn-person bargaining games , 1979 .

[147]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[148]  Jason M. Pittman,et al.  Stovepiping and Malicious Software: A Critical Review of AGI Containment , 2018, ArXiv.

[149]  Mark Harman,et al.  Genetic Improvement of Software: A Comprehensive Survey , 2018, IEEE Transactions on Evolutionary Computation.

[150]  Peter Henderson,et al.  Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims , 2020, ArXiv.

[151]  Amit Sahai,et al.  On the (im)possibility of obfuscating programs , 2001, JACM.

[152]  Shai Ben-David,et al.  Learnability can be undecidable , 2019, Nat. Mach. Intell..

[153]  Roman V. Yampolskiy,et al.  Artificial Intelligence Safety and Security , 2018 .

[154]  Roman V. Yampolskiy Behavioral Modeling: an Overview , 2008 .

[155]  Scott Garrabrant,et al.  Embedded Agency , 2019, ArXiv.

[156]  M. Milanese Unidentifiability versus "actual" observability , 1976 .

[157]  Max Tegmark Life 3.0: Being Human in the Age of Artificial Intelligence , 2017 .

[158]  David A. Basin,et al.  Impossibility Results for Secret Establishment , 2010, 2010 23rd IEEE Computer Security Foundations Symposium.

[159]  Roman V. Yampolskiy,et al.  The Space of Possible Mind Designs , 2015, AGI.

[160]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[161]  Arthur Charlesworth The Comprehensibility Theorem and the Foundations of Artificial Intelligence , 2014, Minds and Machines.

[162]  James Barrat,et al.  Our Final Invention: Artificial Intelligence and the End of the Human Era , 2013 .

[163]  Jason M. Pittman,et al.  A Cyber Science Based Ontology for Artificial General Intelligence Containment , 2018, ArXiv.

[164]  Joy Bill,et al.  Why the future doesn’t need us , 2003 .

[165]  Roman V. Yampolskiy,et al.  Building Safer AGI by introducing Artificial Stupidity , 2018, ArXiv.

[166]  W. Ashby,et al.  Requisite Variety and Its Implications for the Control of Complex Systems , 1991 .

[167]  J. Gans Self-Regulating Artificial General Intelligence , 2017, ArXiv.

[168]  Yehuda Lindell,et al.  Impossibility Results for Universal Composability in Public-Key Models and with Fixed Inputs , 2011, Journal of Cryptology.

[169]  John McDowell,et al.  Virtue and Reason , 1979 .

[170]  Nate Soares,et al.  The Value Learning Problem , 2018, Artificial Intelligence Safety and Security.

[171]  Roman V. Yampolskiy,et al.  Unethical Research: How to Create a Malevolent Artificial Intelligence , 2016, ArXiv.

[172]  Stuart J. Russell,et al.  Research Priorities for Robust and Beneficial Artificial Intelligence , 2015, AI Mag..

[173]  J. Dewar Assumption-Based Planning: The essence of Assumption-Based Planning , 2002 .

[174]  P. Pardalos,et al.  Minimax and applications , 1995 .

[175]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[176]  Roman V. Yampolskiy,et al.  What are the ultimate limits to computational techniques: verifier theory and unverifiability , 2017 .

[177]  Jerzy Klamka,et al.  Controllability of dynamical systems. A survey , 2013 .

[178]  Faith Ellen,et al.  Hundreds of impossibility results for distributed computing , 2003, Distributed Computing.

[179]  James Babcock,et al.  Guidelines for Artificial Intelligence Containment , 2017, Next-Generation Ethics.

[180]  Roman V. Yampolskiy Construction of an NP Problem with an Exponential Lower Bound , 2011, ArXiv.

[181]  Laurent Orseau,et al.  Reinforcement Learning with a Corrupted Reward Channel , 2017, IJCAI.

[182]  Ali Aydin Selçuk,et al.  Undecidable problems in malware analysis , 2017, 2017 12th International Conference for Internet Technology and Secured Transactions (ICITST).

[183]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[184]  M. G. Rodd Safe AI—is this possible?☆ , 1995 .

[185]  N Wiener,et al.  Some moral and technical consequences of automation , 1960, Science.

[186]  Yael Tauman Kalai,et al.  On the impossibility of obfuscation with auxiliary input , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[187]  Uncontrollability and unobservability of multivariable systems , 1972 .

[188]  Roman V Yampolskiy,et al.  Responses to catastrophic AGI risk: a survey , 2014 .

[189]  James H. Fetzer Program verification: the very idea , 1988, CACM.

[190]  Kristen W. Carlson Safe Artificial General Intelligence via Distributed Ledger Technology , 2019, Big Data Cogn. Comput..

[191]  Ryan Jenkins,et al.  Autonomous Machines, Moral Judgment, and Acting for the Right Reasons , 2015 .