Safe reinforcement learning in high-risk tasks through policy improvement

Reinforcement Learning (RL) methods are widely used for dynamic control tasks. In many cases, these are high risk tasks where the trial and error process may select actions which execution from unsafe states can be catastrophic. In addition, many of these tasks have continuous state and action spaces, making the learning problem harder and unapproachable with conventional RL algorithms. So, when the agent begins to interact with a risky and large state-action space environment, an important question arises: how can we avoid that the exploration of the state-action space causes damages in the learning (or other) systems. In this paper, we define the concept of risk and address the problem of safe exploration in the context of RL. Our notion of safety is concerned with states that can lead to damage. Moreover, we introduce an algorithm that safely improves suboptimal but robust behaviors for continuous state and action control tasks, and that learns efficiently from the experience gathered from the environment. We report experimental results using the helicopter hovering task from the RL Competition.

[1]  Brett Browning,et al.  Learning by demonstration with critique from a human teacher , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[2]  Brett Browning,et al.  Learning robot motion control with demonstration and advice-operators , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Peter Geibel,et al.  Reinforcement Learning with Bounded Risk , 2001, ICML.

[4]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[5]  Yasuharu Koike,et al.  PII: S0893-6080(96)00043-3 , 1997 .

[6]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[7]  Manuela M. Veloso,et al.  Teaching sequential tasks with repetition through demonstration , 2008, AAMAS.

[8]  Brett Browning,et al.  Learning Mobile Robot Motion Control from Demonstrated Primitives and Human Feedback , 2011, ISRR.

[9]  Steffen Udluft,et al.  Safe exploration for reinforcement learning , 2008, ESANN.

[10]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[11]  Babak Esfandiari,et al.  A Case-Based Reasoning Approach to Imitating RoboCup Players , 2008, FLAIRS.

[12]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[13]  Fritz Wysotzki,et al.  Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..

[14]  Javier García,et al.  SIMBA: A simulator for business education and research , 2010, Decis. Support Syst..

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Javier de Lope,et al.  Learning Autonomous Helicopter Flight with Evolutionary Reinforcement Learning , 2009 .

[17]  Mitsuo Kawato,et al.  Teaching by Showing in Kendama Based on Optimization Principle , 1994 .

[18]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[19]  Monica N. Nicolescu,et al.  Experience-based representation construction: learning from human and robot teachers , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[20]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..