Safely Interruptible Agents

Reinforcement learning agents interacting with a complex environment like the real world are unlikely to behave optimally all the time. If such an agent is operating in real-time under human supervision, now and then it may be necessary for a human operator to press the big red button to prevent the agent from continuing a harmful sequence of actions—harmful either for the agent or for the environment—and lead the agent into a safer situation. However, if the learning agent expects to receive rewards from this sequence, it may learn in the long run to avoid such interruptions, for example by disabling the red button— which is an undesirable outcome. This paper explores a way to make sure a learning agent will not learn to prevent (or seek!) being interrupted by the environment or a human operator. We provide a formal definition of safe interruptibility and exploit the off-policy learning property to prove that either some agents are already safely interruptible, like Q-learning, or can easily be made so, like Sarsa. We show that even ideal, uncomputable reinforcement learning agents for (deterministic) general computable environments can be made safely interruptible.

[1]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[2]  Satinder Singh,et al.  An Upper Bound on the Loss from Approximate Optimal-Value Functions , 2004, Machine-mediated learning.

[3]  Mark Humphrys Action Selection in a hypothetical house robot: Using those RL numbers , 1996 .

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[6]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[7]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[8]  Marcus Hutter Simulation Algorithms for Computational Systems Biology , 2017, Texts in Theoretical Computer Science. An EATCS Series.

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Stephen M. Omohundro,et al.  The Basic AI Drives , 2008, AGI.

[11]  Jürgen Schmidhuber,et al.  Optimal Direct Policy Search , 2011, AGI.

[12]  Tor Lattimore,et al.  Asymptotically Optimal Agents , 2011, ALT.

[13]  Jürgen Schmidhuber,et al.  Artificial General Intelligence - 4th International Conference, AGI 2011, Mountain View, CA, USA, August 3-6, 2011. Proceedings , 2011, AGI.

[14]  Laurent Orseau,et al.  Delusion, Survival, and Intelligent Agents , 2011, AGI.

[15]  Dr. Tom Murphy The First Level of Super Mario Bros . is Easy with Lexicographic Orderings and Time Travel , 2013 .

[16]  Laurent Orseau,et al.  Asymptotic non-learnability of universal agents with computable horizon functions , 2013, Theor. Comput. Sci..

[17]  Tor Lattimore,et al.  Bayesian Reinforcement Learning with Exploration , 2014, ALT.

[18]  Tomás Svoboda,et al.  Safe Exploration Techniques for Reinforcement Learning - An Overview , 2014, MESAS.

[19]  Nick Bostrom,et al.  Superintelligence: Paths, Dangers, Strategies , 2014 .

[20]  Jan Hodicky,et al.  Modelling and Simulation for Autonomous Systems , 2014, Lecture Notes in Computer Science.

[21]  Marcus Hutter,et al.  Bad Universal Priors and Notions of Optimality , 2015, COLT.

[22]  James Babcock,et al.  Artificial General Intelligence , 2016, Lecture Notes in Computer Science.