Unpredictability of AI

The young field of AI Safety is still in the process of identifying its challenges and limitations. In this paper, we formally describe one such impossibility result, namely Unpredictability of AI. We prove that it is impossible to precisely and consistently predict what specific actions a smarter-than-human intelligent system will take to achieve its objectives, even if we know terminal goals of the system. In conclusion, impact of Unpredictability on AI Safety is discussed.

[1]  Roman V Yampolskiy,et al.  AI safety engineering through introduction of self-reference into felicific calculus via artificial pain and pleasure , 2014, 2014 IEEE International Symposium on Ethics in Science, Technology and Engineering.

[2]  Stuart Armstrong,et al.  Impossibility of deducing preferences and rationality from human policy , 2017, NIPS 2018.

[3]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[4]  David A. Ferrucci,et al.  Introduction to "This is Watson" , 2012, IBM J. Res. Dev..

[5]  Arslan Munir,et al.  A Psychopathological Approach to Safety Engineering in AI and AGI , 2018, SAFECOMP Workshops.

[6]  Peter Eckersley Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function) , 2019, SafeAI@AAAI.

[7]  Cristopher Moore,et al.  Generalized shifts: unpredictability and undecidability in dynamical systems , 1991 .

[8]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[9]  Sanford J. Grossman On the Impossibility of Informationally Efficient Markets , 1980 .

[10]  J. Clune,et al.  The Surprising Creativity of Digital Evolution , 2018, ALIFE.

[11]  Roman V Yampolskiy Predicting future AI failures from historic examples , 2019, foresight.

[12]  H. Rice Classes of recursively enumerable sets and their decision problems , 1953 .

[13]  Yavar Bathaee,et al.  The Artificial Intelligence Black Box and the Failure of Intent and Causation , 2018 .

[14]  Hyrum S. Anderson,et al.  The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation , 2018, ArXiv.

[15]  Navot Israeli,et al.  Computational irreducibility and the predictability of complex physical systems. , 2003, Physical review letters.

[16]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[17]  Roman V Yampolskiy,et al.  BEYOND MAD ? : THE RACE FOR ARTIFICIAL GENERAL INTELLIGENCE , 2018 .

[18]  Stuart Armstrong,et al.  Occam's razor is insufficient to infer the preferences of irrational agents , 2017, NeurIPS.

[19]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[20]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[21]  Nick Bostrom,et al.  The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents , 2012, Minds and Machines.

[22]  S. Hochreiter,et al.  REINFORCEMENT DRIVEN INFORMATION ACQUISITION IN NONDETERMINISTIC ENVIRONMENTS , 1995 .

[23]  David Denkenberger,et al.  Classification of global catastrophic risks connected with artificial intelligence , 2018, AI & SOCIETY.

[24]  Roman V. Yampolskiy,et al.  Artificial Intelligence Safety and Security , 2018 .

[25]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[26]  Eliezer,et al.  Tiling Agents for Self-Modifying AI , and the Löbian Obstacle * , 2013 .

[27]  Michael Fisher,et al.  Towards Moral Autonomous Systems , 2017, ArXiv.

[28]  Pierre Baldi,et al.  A principled approach to detecting surprising events in video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Max H. Bazerman,et al.  The Impossibility of Auditor Independence , 1997 .

[30]  C. List,et al.  Aggregating Sets of Judgments: An Impossibility Result , 2002, Economics and Philosophy.

[31]  Jean-Marie Dufour,et al.  Some Impossibility Theorems in Econometrics with Applications to Structural and Dynamic Models , 1997 .

[32]  Roman V Yampolskiy Artificial Consciousness: An Illusionary Solution to the Hard Problem , 2018 .

[33]  Roman V. Yampolskiy,et al.  The Space of Possible Mind Designs , 2015, AGI.

[34]  Roman V. Yampolskiy,et al.  Building Safer AGI by introducing Artificial Stupidity , 2018, ArXiv.

[35]  James Babcock,et al.  Artificial General Intelligence , 2016, Lecture Notes in Computer Science.

[36]  Roman Yampolskiy,et al.  The Formalization of AI Risk Management and Safety Standards , 2017, AAAI Workshops.

[37]  Olle Häggström,et al.  Long-term trajectories of human civilization , 2019, foresight.

[38]  Roman V. Yampolskiy,et al.  Modeling and Interpreting Expert Disagreement About Artificial Superintelligence , 2017, Informatica.

[39]  Roman V. Yampolskiy,et al.  Unethical Research: How to Create a Malevolent Artificial Intelligence , 2016, ArXiv.

[40]  Benja Fallenstein,et al.  Vingean Reflection : Reliable Reasoning for Self-Improving Agents , 2015 .

[41]  Stephen M. Omohundro,et al.  The Basic AI Drives , 2008, AGI.

[42]  Roman V. Yampolskiy,et al.  What to Do with the Singularity Paradox? , 2011, PT-AI.

[43]  G. Strawson,et al.  The impossibility of moral responsibility , 1994 .

[44]  Jürgen Schmidhuber,et al.  Simple algorithmic theory of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes (特集 高次機能の学習と創発--脳・ロボット・人間研究における新たな展開) , 2009 .

[45]  Roman V. Yampolskiy,et al.  What are the ultimate limits to computational techniques: verifier theory and unverifiability , 2017 .

[46]  James Babcock,et al.  Guidelines for Artificial Intelligence Containment , 2017, Next-Generation Ethics.

[47]  James Babcock,et al.  The AGI Containment Problem , 2016 .

[48]  Moore,et al.  Unpredictability and undecidability in dynamical systems. , 1990, Physical review letters.

[49]  Stephen Wolfram,et al.  A New Kind of Science , 2003, Artificial Life.