论文信息 - Closing the Closed-Loop Distribution Shift in Safe Imitation Learning

Closing the Closed-Loop Distribution Shift in Safe Imitation Learning

Commonly used optimization-based control strategies such as model-predictive and control Lyapunov/barrier function based controllers often enjoy provable stability, robustness, and safety properties. However, implementing such approaches requires solving optimization problems online at high-frequencies, which may not be possible on resource-constrained commodity hardware. Furthermore, how to extend the safety guarantees of such approaches to systems that use rich perceptual sensing modalities, such as cameras, remains unclear. In this paper, we address this gap by treating safe optimization-based control strategies as experts in an imitation learning problem, and train a learned policy that can be cheaply evaluated at run-time and that provably satisfies the same safety guarantees as the expert. In particular, we propose Constrained Mixing Iterative Learning (CMILe), a novel on-policy robust imitation learning algorithm that integrates ideas from stochastic mixing iterative learning, constrained policy optimization, and nonlinear robust control. Our approach allows us to control errors introduced by both the learning task of imitating an expert and by the distribution shift inherent to deviating from the original expert policy. The value of using tools from nonlinear robust control to impose stability constraints on learned policies is shown through sample-complexity bounds that are independent of the task time-horizon. We demonstrate the usefulness of CMILe through extensive experiments, including training a provably safe perception-based controller using a state-feedback-based expert.

[1] Soon-Jo Chung,et al. Robust Regression for Safe Exploration in Control , 2019, L4DC.

[2] Katherine Rose Driggs-Campbell,et al. DropoutDAgger: A Bayesian Approach to Safe Imitation Learning , 2017, ArXiv.

[3] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[4] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[5] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[6] Ufuk Topcu,et al. Safe Reinforcement Learning via Shielding , 2017, AAAI.

[7] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[8] David Angeli,et al. A Lyapunov approach to incremental stability properties , 2002, IEEE Trans. Autom. Control..

[9] Pieter Abbeel,et al. An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[10] Andreas Maurer,et al. A Vector-Contraction Inequality for Rademacher Complexities , 2016, ALT.

[11] Anca D. Dragan,et al. DART: Noise Injection for Robust Imitation Learning , 2017, CoRL.

[12] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[13] Mi-Ching Tsai,et al. Robust and Optimal Control , 2014 .

[14] Byron Boots,et al. Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.

[15] Nathan Fulton,et al. Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning , 2018, AAAI.

[16] Silvio Savarese,et al. Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation , 2020, CoRL.

[17] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[18] Martin J. Wainwright,et al. High-Dimensional Statistics , 2019 .

[19] Yuandong Tian,et al. Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees , 2018, ICLR.

[20] Jean-Jacques E. Slotine,et al. Regret Bounds for Adaptive Nonlinear Control , 2020, L4DC.

[21] Katherine Rose Driggs-Campbell,et al. EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning , 2018, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22] Daniel Nikovski,et al. Value-Aware Loss Function for Model-based Reinforcement Learning , 2017, AISTATS.

[23] Jean-Jacques E. Slotine,et al. On Contraction Analysis for Non-linear Systems , 1998, Autom..

[24] Hal Daumé,et al. Frustratingly Easy Domain Adaptation , 2007, ACL.

[25] Yoshua Bengio,et al. Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[26] G. Dullerud,et al. A Course in Robust Control Theory: A Convex Approach , 2005 .

[27] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .

[28] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.

[29] Victor S. Lempitsky,et al. Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[30] Yang Song,et al. Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Benjamin Recht,et al. Certainty Equivalent Perception-Based Control , 2020, L4DC.

[32] Aaron D. Ames,et al. Guaranteeing Safety of Learned Perception Modules via Measurement-Robust Control Barrier Functions , 2020, CoRL.

[33] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34] Frank Allgöwer,et al. Learning an Approximate Model Predictive Controller With Guarantees , 2018, IEEE Control Systems Letters.

[35] Nikolai Matni,et al. Learning Stability Certificates from Data , 2020, CoRL.

[36] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[37] Evangelos Theodorou,et al. Safe end-to-end imitation learning for model predictive control , 2018, ArXiv.

[38] Benjamin Recht,et al. A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[39] Alexandre Proutière,et al. From self-tuning regulators to reinforcement learning and back again , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[40] Murat Arcak,et al. Imitation Learning with Stability and Safety Guarantees , 2020, ArXiv.

[41] Charles A. Micchelli,et al. On Learning Vector-Valued Functions , 2005, Neural Computation.

[42] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[43] Jitendra Malik,et al. Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies , 2018 .

[44] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[45] Alberto Bemporad,et al. Predictive Control for Linear and Hybrid Systems , 2017 .

[46] Soon-Jo Chung,et al. Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems , 2020, IEEE Robotics Autom. Lett..

[47] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[48] Alexey Dosovitskiy,et al. End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[49] Ivor W. Tsang,et al. Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.