Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

AI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven't yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a human "in the loop" and ready to intervene is currently the only way to prevent all catastrophes. We formalize human intervention for RL and show how to reduce the human labor required by training a supervised learner to imitate the human's intervention decisions. We evaluate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours. When the class of catastrophes is simple, we are able to prevent all catastrophes without affecting the agent's learning (whereas an RL baseline fails due to catastrophic forgetting). However, this scheme is less successful when catastrophes are more complex: it reduces but does not eliminate catastrophes and the supervised learner fails on adversarial examples found by the agent. Extrapolating to more challenging environments, we show that our implementation would not scale (due to the infeasible amount of human labor required). We outline extensions of the scheme that are necessary if we are to train model-free agents without a single catastrophe.

[1]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[2]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[3]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[4]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[7]  Ran El-Yaniv,et al.  Deep Learning of Robotic Tasks using Strong and Weak Human Supervision , 2016, ArXiv.

[8]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[9]  Wojciech Zaremba,et al.  Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.

[10]  David Silver,et al.  Learning values across many orders of magnitude , 2016, NIPS.

[11]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[12]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[13]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[14]  Honglak Lee,et al.  Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games , 2016, IJCAI.

[15]  Ran El-Yaniv,et al.  Deep Learning of Robotic Tasks without a Simulator using Strong and Weak Human Supervision , 2016 .

[16]  Gina Neff,et al.  Talking to Bots: Symbiotic Agency and the Case of Tay , 2016 .

[17]  Joel Z. Leibo,et al.  Model-Free Episodic Control , 2016, ArXiv.

[18]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[19]  Marc G. Bellemare,et al.  Count-Based Exploration with Neural Density Models , 2017, ICML.

[20]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[21]  Zachary Chase Lipton,et al.  Combating Deep Reinforcement Learning's Sisyphean Curse with Intrinsic Fear , 2016, 1611.01211.

[22]  Combating Deep Reinforcement Learning ’ s Sisyphean Curse with Reinforcement Learning , 2017 .

[23]  Shimon Whiteson,et al.  OFFER: Off-Environment Reinforcement Learning , 2017, AAAI.

[24]  John Salvatier,et al.  Agent-Agnostic Human-in-the-Loop Reinforcement Learning , 2017, ArXiv.

[25]  John Salvatier,et al.  Active Reinforcement Learning: Observing Rewards at a Cost , 2020, ArXiv.