Avoiding Side Effects in Complex Environments

Reward function specification can be difficult, even in simple environments. Realistic environments contain millions of states. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoids side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on Conway's Game of Life. By preserving optimal value for a single randomly generated reward function, AUP incurs modest overhead, completes the specified task, and avoids side effects.

[1]  E. Altman Constrained Markov Decision Processes , 1999 .

[2]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[3]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[4]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[5]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[6]  Kevin B. Korb,et al.  The Frame Problem: An AI Fairy Tale , 1998, Minds and Machines.

[7]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[8]  Anca D. Dragan,et al.  Inverse Reward Design , 2017, NIPS.

[9]  Joel Lehman,et al.  Towards Empathic Deep Q-Learning , 2019, AISafety@IJCAI.

[10]  Peter Eckersley,et al.  SafeLife 1.0: Exploring Side Effects in Complex Environments , 2019, SafeAI@AAAI.

[11]  John P. Cunningham,et al.  The continuous Bernoulli: fixing a pervasive error in variational autoencoders , 2019, NeurIPS.

[12]  Dylan Hadfield-Menell,et al.  Conservative Agency via Attainable Utility Preservation , 2019, AIES.

[13]  F. Brown The frame problem in artificial intelligence , 1987 .

[14]  Master Gardener,et al.  Mathematical games: the fantastic combinations of john conway's new solitaire game "life , 1970 .

[15]  Tomás Svoboda,et al.  Safe Exploration Techniques for Reinforcement Learning - An Overview , 2014, MESAS.

[16]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[17]  Laurent Orseau,et al.  Reinforcement Learning with a Corrupted Reward Channel , 2017, IJCAI.

[18]  Ofir Nachum,et al.  A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[19]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[20]  Laurent Orseau,et al.  AI Safety Gridworlds , 2017, ArXiv.

[21]  Joelle Pineau,et al.  Benchmarking Batch Deep Reinforcement Learning Algorithms , 2019, ArXiv.

[22]  Paul W. Rendell,et al.  Turing Universality of the Game of Life , 2002, Collision-Based Computing.

[23]  Edmund H. Durfee,et al.  Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes , 2018, IJCAI.

[24]  Craig Boutilier,et al.  Robust Policy Computation in Reward-Uncertain MDPs Using Nondominated Policies , 2010, AAAI.

[25]  Sergey Levine,et al.  Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning , 2017, ICLR.

[26]  Frank Markham Brown The Frame Problem in Artificial Intelligence: Proceedings of the 1987 Workshop April 12-15, 1987 Lawrence, Kansas , 1987 .

[27]  Alexander Matt Turner Optimal Farsighted Agents Tend to Seek Power , 2019, ArXiv.

[28]  Laurent Orseau,et al.  Measuring and avoiding side effects using relative reachability , 2018, ArXiv.