Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees

Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to safely close the reality gap. To improve safety, we apply a dual policy setup where a performance policy is trained using the cumulative task reward and a backup (safety) policy is trained by solving the reach-avoid Bellman Equation based on Hamilton-Jacobi reachability analysis. In Sim-to-Lab transfer, we apply a supervisory control scheme to shield unsafe actions during exploration; in Lab-to-Real transfer, we leverage the Probably Approximately Correct (PAC)-Bayes framework to provide lower bounds on the expected performance and safety of policies in unseen environments. We empirically study the proposed framework for ego-vision navigation in two types of indoor environments including a photo-realistic one. We also demonstrate strong generalization performance through hardware experiments in real indoor spaces with a quadrupedal robot. See https://sites.google.com/princeton.edu/sim-to-lab-to-real for supplementary material.

[1]  Jaime F. Fisac,et al.  Bridging Hamilton-Jacobi Safety Analysis and Reinforcement Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[2]  Sergey Levine,et al.  One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL , 2020, NeurIPS.

[3]  Peng Liu,et al.  3D-FRONT: 3D Furnished Rooms with layOuts and semaNTics , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  Mo Chen,et al.  Reach-avoid problems with time-varying dynamics, targets and constraints , 2014, HSCC.

[8]  P. Alam ‘S’ , 2021, Composites Engineering: An A–Z Guide.

[9]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[10]  Sushant Veer,et al.  Learning Provably Robust Motion Planners Using Funnel Libraries , 2021, ArXiv.

[11]  Sehoon Ha,et al.  Learning to be Safe: Deep RL with a Safety Critic , 2020, ArXiv.

[12]  Jitendra Malik,et al.  Combining Optimal Control and Learning for Visual Navigation in Novel Environments , 2019, CoRL.

[13]  Sushant Veer,et al.  Task-Driven Out-of-Distribution Detection with Statistical Guarantees for Robot Learning , 2021, CoRL.

[14]  Somil Bansal,et al.  Generating Robust Supervision for Learning-Based Visual Navigation Using Hamilton-Jacobi Reachability , 2020, L4DC.

[15]  Lorenz Wellhausen,et al.  Safe Robot Navigation Via Multi-Modal Anomaly Detection , 2020, IEEE Robotics and Automation Letters.

[16]  Marco Pavone,et al.  Robust online motion planning via contraction theory and convex optimization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Sergey Levine,et al.  Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.

[18]  J. Doyle,et al.  Essentials of Robust Control , 1997 .

[19]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[20]  Sebastian Thrun,et al.  Integrating Grid-Based and Topological Maps for Mobile Robot Navigation , 1996, AAAI/IAAI, Vol. 2.

[21]  Mohammad Ghavamzadeh,et al.  Algorithms for CVaR Optimization in MDPs , 2014, NIPS.

[22]  Jaime F. Fisac,et al.  A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems , 2017, IEEE Transactions on Automatic Control.

[23]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[24]  Jitendra Malik,et al.  RMA: Rapid Motor Adaptation for Legged Robots , 2021, Robotics: Science and Systems.

[25]  Brijen Thananjeyan,et al.  Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones , 2020, IEEE Robotics and Automation Letters.

[26]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[27]  Marco Pavone,et al.  Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..

[28]  Csaba Szepesvari,et al.  Tighter risk certificates for neural networks , 2020, J. Mach. Learn. Res..

[29]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[30]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[31]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[32]  Jonathan P. How,et al.  Safe Reinforcement Learning With Model Uncertainty Estimates , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[33]  장윤희,et al.  Y. , 2003, Industrial and Labor Relations Terms.

[34]  Francisco Bonin-Font,et al.  Visual Navigation for Mobile Robots: A Survey , 2008, J. Intell. Robotic Syst..

[35]  Mo Chen,et al.  Hamilton-Jacobi reachability: A brief overview and recent advances , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[36]  Jaime F. Fisac,et al.  Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning , 2021, Robotics: Science and Systems.

[37]  Claire J. Tomlin,et al.  An Efficient Reachability-Based Framework for Provably Safe Autonomous Navigation in Unknown Environments , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[38]  Charles Richter,et al.  Safe Visual Navigation via Deep Learning and Novelty Detection , 2017, Robotics: Science and Systems.

[39]  Yuval Tassa,et al.  Safe Exploration in Continuous Action Spaces , 2018, ArXiv.

[40]  Sushant Veer,et al.  Probably Approximately Correct Vision-Based Planning using Motion Primitives , 2020, ArXiv.

[41]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[42]  James J. Little,et al.  Autonomous vision-based exploration and mapping using hybrid maps and Rao-Blackwellised particle filters , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[44]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[45]  Russ Tedrake,et al.  Funnel libraries for real-time robust feedback motion planning , 2016, Int. J. Robotics Res..

[46]  Sushant Veer,et al.  Generalization Guarantees for Imitation Learning , 2020 .

[47]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[48]  Allan Jabri,et al.  Unsupervised Curricula for Visual Meta-Reinforcement Learning , 2019, NeurIPS.

[49]  Thomas de Quincey [C] , 2000, The Works of Thomas De Quincey, Vol. 1: Writings, 1799–1820.

[50]  Anirudha Majumdar,et al.  PAC-Bayes control: learning policies that provably generalize to novel environments , 2018, Int. J. Robotics Res..

[51]  John Langford,et al.  (Not) Bounding the True Error , 2001, NIPS.