Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

In many environments, only a relatively small subset of the complete state space is necessary in order to accomplish a given task. We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that efficiently trades off the rate of convergence with a small asymptotic sub-optimality gap. We analyze the regret behavior of e-stops and present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods.

[1]  Charles Richter,et al.  Safe Visual Navigation via Deep Learning and Novelty Detection , 2017, Robotics: Science and Systems.

[2]  Sergey Levine,et al.  Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.

[3]  Peter Stone,et al.  Generative Adversarial Imitation from Observation , 2018, ArXiv.

[4]  Rémi Munos,et al.  Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.

[5]  Javier García,et al.  Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..

[6]  Alborz Geramifard,et al.  UAV cooperative control with stochastic risk models , 2011, Proceedings of the 2011 American Control Conference.

[7]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Frederic Maire,et al.  Apprenticeship Learning for Initial Value Functions in Reinforcement Learning , 2005, IJCAI 2005.

[10]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[11]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[12]  J A Bagnell,et al.  An Invitation to Imitation , 2015 .

[13]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[14]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[15]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[16]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[17]  Sergey Levine,et al.  Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning , 2017, ICLR.

[18]  Anca D. Dragan,et al.  SHIV: Reducing supervisor burden in DAgger using support vectors for efficient learning from demonstrations in high dimensional state spaces , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Sham M. Kakade,et al.  Variance Reduction Methods for Sublinear Reinforcement Learning , 2018, ArXiv.

[20]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[21]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[22]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.