Decision Support for Safe AI Design

There is considerable interest in ethical designs for artificial intelligence (AI) that do not pose risks to humans. This paper proposes using elements of Hutter's agent-environment framework to define a decision support system for simulating, visualizing and analyzing AI designs to understand their consequences. The simulations do not have to be accurate predictions of the future; rather they show the futures that an agent design predicts will fulfill its motivations and that can be explored by AI designers to find risks to humans. In order to safely create a simulation model this paper shows that the most probable finite stochastic program to explain a finite history is finitely computable, and that there is an agent that makes such a computation without any unintended instrumental actions. It also discusses the risks of running an AI in a simulated environment.

[1]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[2]  D. Chalmers The Singularity: a Philosophical Analysis , 2010 .

[3]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications, Third Edition , 1997, Texts in Computer Science.

[4]  Jürgen Schmidhuber,et al.  Artificial General Intelligence - 4th International Conference, AGI 2011, Mountain View, CA, USA, August 3-6, 2011. Proceedings , 2011, AGI.

[5]  Marcus Hutter,et al.  Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability (Texts in Theoretical Computer Science. An EATCS Series) , 2006 .

[6]  A Peer-reviewed Electronic Journal Published by the Institute for Ethics and Emerging Technologies , 2008 .

[7]  Mark R. Waser Designing a Safe Motivational System for Intelligent Machines , 2010, AGI 2010.

[8]  Amnon H. Eden,et al.  Singularity Hypotheses: A Scientific and Philosophical Assessment , 2013 .

[9]  Stephen M. Omohundro,et al.  The Basic AI Drives , 2008, AGI.

[10]  Marcus Hutter,et al.  Feature Reinforcement Learning: Part I. Unstructured MDPs , 2009, J. Artif. Gen. Intell..

[11]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[12]  S. Lloyd Computational capacity of the universe. , 2001, Physical review letters.

[13]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[14]  Nick Bostrom,et al.  Thinking Inside the Box: Controlling and Using an Oracle AI , 2012, Minds and Machines.

[15]  A. Church,et al.  Astounding Science Fiction. , 1952 .

[16]  Marcus Hutter,et al.  Universal Artificial Intellegence - Sequential Decisions Based on Algorithmic Probability , 2005, Texts in Theoretical Computer Science. An EATCS Series.

[17]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[18]  Marco Gori,et al.  Adaptive Processing of Sequences and Data Structures , 1998, Lecture Notes in Computer Science.

[19]  William L. Hibbard,et al.  The VIS-5D system for easy interactive visualization , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[20]  Bill Hibbard,et al.  Avoiding Unintended AI Behaviors , 2012, AGI.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Raymond C. Kurzweil,et al.  The Singularity Is Near , 2018, The Infinite Desire for Growth.

[23]  R. Kurzweil,et al.  The Singularity Is Near: When Humans Transcend Biology , 2006 .

[24]  Marcus Hutter,et al.  Feature Dynamic Bayesian Networks , 2008, ArXiv.

[25]  Bill Hibbard,et al.  Super-intelligent machines , 2012, COMG.

[26]  Daniel Dewey,et al.  Learning What to Value , 2011, AGI.

[27]  Luke Muehlhauser,et al.  The Singularity and Machine Ethics , 2012 .

[28]  Nick Bostrom,et al.  The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents , 2012, Minds and Machines.

[29]  Laurent Orseau,et al.  Delusion, Survival, and Intelligent Agents , 2011, AGI.

[30]  Mark R. Waser,et al.  Rational Universal Benevolence: Simpler, Safer, and Wiser Than "Friendly AI" , 2011, AGI.

[31]  Bill Hibbard,et al.  Model-based Utility Functions , 2011, J. Artif. Gen. Intell..