Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings

Reinforcement learning (RL) in real-world safety-critical target settings like urban driving is hazardous, imperiling the RL agent, other agents, and the environment. To overcome this difficulty, we propose a "safety-critical adaptation" task setting: an agent first trains in non-safety-critical "source" environments such as in a simulator, before it adapts to the target environment where failures carry heavy costs. We propose a solution approach, CARL, that builds on the intuition that prior experience in diverse environments equips an agent to estimate risk, which in turn enables relative safety through risk-averse, cautious adaptation. CARL first employs model-based RL to train a probabilistic model to capture uncertainty about transition dynamics and catastrophic states across varied source environments. Then, when exploring a new safety-critical environment with unknown dynamics, the CARL agent plans to avoid actions that could lead to catastrophic states. In experiments on car driving, cartpole balancing, half-cheetah locomotion, and robotic object manipulation, CARL successfully acquires cautious exploration behaviors, yielding higher rewards with fewer failures than strong RL adaptation baselines. Website at this https URL.

[1]  Mohammad Ghavamzadeh,et al.  Algorithms for CVaR Optimization in MDPs , 2014, NIPS.

[2]  Juraj Kabzan,et al.  Cautious Model Predictive Control Using Gaussian Process Regression , 2017, IEEE Transactions on Control Systems Technology.

[3]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Bjarne Foss,et al.  Scenario-based MPC with gradual relaxation of output constraints , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[5]  Katja Hofmann,et al.  Meta Reinforcement Learning with Latent Variable Gaussian Processes , 2018, UAI.

[6]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[7]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[8]  Vivek S. Borkar,et al.  Risk-constrained Markov decision processes , 2010, 49th IEEE Conference on Decision and Control (CDC).

[9]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..

[10]  Shie Mannor,et al.  Risk-Sensitive and Robust Decision-Making: a CVaR Optimization Approach , 2015, NIPS.

[11]  Sergey Levine,et al.  Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[12]  Shie Mannor,et al.  Optimizing the CVaR via Sampling , 2014, AAAI.

[13]  Insoon Yang,et al.  Risk-Aware Motion Planning and Control Using CVaR-Constrained Optimization , 2019, IEEE Robotics and Automation Letters.

[14]  Angela P. Schoellig,et al.  Robust Constrained Learning-based NMPC enabling reliable mobile robot path tracking , 2016, Int. J. Robotics Res..

[15]  Yann LeCun,et al.  Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic , 2019, ICLR.

[16]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[17]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[18]  R. Rockafellar,et al.  Optimization of conditional value-at risk , 2000 .

[19]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[20]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[21]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[23]  Masashi Sugiyama,et al.  Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.

[24]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[25]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[26]  Jaime F. Fisac,et al.  A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems , 2017, IEEE Transactions on Automatic Control.

[27]  Sergey Levine,et al.  Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.

[28]  Ashish Kapoor,et al.  Safe Control under Uncertainty with Probabilistic Signal Temporal Logic , 2016, Robotics: Science and Systems.

[29]  Pengqian Yu,et al.  Model-based Deep Reinforcement Learning for Dynamic Portfolio Optimization , 2019, ArXiv.

[30]  Erwan Lecarpentier,et al.  Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning , 2019, NeurIPS.

[31]  Shie Mannor,et al.  Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..