Chance-Constrained Trajectory Optimization for Safe Exploration and Learning of Nonlinear Systems

Learning-based control algorithms require data collection with abundant supervision for training. Safe exploration algorithms ensure the safety of this data collection process even when only partial knowledge is available. We present a new approach for optimal motion planning with safe exploration that integrates chance-constrained stochastic optimal control with dynamics learning and feedback control. We derive an iterative convex optimization algorithm that solves an \underline{Info}rmation-cost \underline{S}tochastic \underline{N}onlinear \underline{O}ptimal \underline{C}ontrol problem (Info-SNOC). The optimization objective encodes control cost for performance and exploration cost for learning, and the safety is incorporated as distributionally robust chance constraints. The dynamics are predicted from a robust regression model that is learned from data. The Info-SNOC algorithm is used to compute a sub-optimal pool of safe motion plans that aid in exploration for learning unknown residual dynamics under safety constraints. A stable feedback controller is used to execute the motion plan and collect data for model learning. We prove the safety of rollout from our exploration method and reduction in uncertainty over epochs, thereby guaranteeing the consistency of our learning method. We validate the effectiveness of Info-SNOC by designing and implementing a pool of safe trajectories for a planar robot. We demonstrate that our approach has higher success rate in ensuring safety when compared to a deterministic trajectory optimization approach.

[1]  I. R. Savage Mill's ratio for multivariate normal distributions , 1962 .

[2]  B. Øksendal Stochastic Differential Equations , 1985 .

[3]  Tamer Basar,et al.  Dual Control Theory , 2001 .

[4]  Masahiro Ono,et al.  Iterative Risk Allocation: A new approach to robust Model Predictive Control with a joint chance constraint , 2008, 2008 47th IEEE Conference on Decision and Control.

[5]  Nicolas Tabareau,et al.  A Contraction Theory Approach to Stochastic Incremental Stability , 2007, IEEE Transactions on Automatic Control.

[6]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[7]  Moritz Diehl,et al.  Local Convergence of Sequential Convex Programming for Nonconvex Optimization , 2010 .

[8]  Masahiro Ono,et al.  Chance-Constrained Optimal Path Planning With Obstacles , 2011, IEEE Transactions on Robotics.

[9]  Javier García,et al.  Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..

[10]  Soon-Jo Chung,et al.  Decentralized Model Predictive Control of Swarms of Spacecraft Using Sequential Convex Programming , 2013 .

[11]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..

[12]  Angela P. Schoellig,et al.  Learning-based nonlinear model predictive control to improve vision-based mobile robot path-tracking in challenging outdoor environments , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Stefano Di Cairano,et al.  Robust dual control MPC with application to soft-landing control , 2015, 2015 American Control Conference (ACC).

[14]  Pieter Abbeel,et al.  Deep learning helicopter dynamics models , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Soon-Jo Chung,et al.  Observer Design for Stochastic Nonlinear Systems via Contraction-Based Incremental Stability , 2015, IEEE Transactions on Automatic Control.

[16]  Soon-Jo Chung,et al.  Swarm assignment and trajectory optimization using variable-swarm, distributed auction assignment and sequential convex programming , 2016, Int. J. Robotics Res..

[17]  Brian D. Ziebart,et al.  Robust Covariate Shift Regression , 2016, AISTATS.

[18]  Angela P. Schoellig,et al.  Robust Constrained Learning-based NMPC enabling reliable mobile robot path tracking , 2016, Int. J. Robotics Res..

[19]  Angela P. Schoellig,et al.  Learning‐based Nonlinear Model Predictive Control to Improve Vision‐based Mobile Robot Path Tracking , 2016, J. Field Robotics.

[20]  Ali Mesbah,et al.  Stochastic model predictive control with active uncertainty learning: A Survey on dual control , 2017, Annu. Rev. Control..

[21]  Alexander Liniger,et al.  Cautious NMPC with Gaussian Process Dynamics for Autonomous Miniature Race Cars , 2017, 2018 European Control Conference (ECC).

[22]  Torsten Koller,et al.  Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.

[23]  Fred Y. Hadaegh,et al.  A Six Degree-of-Freedom Spacecraft Dynamics Simulator for Formation Control Research , 2018 .

[24]  Soon-Jo Chung,et al.  Trajectory Optimization for Chance-Constrained Nonlinear Stochastic Systems , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[25]  Soon-Jo Chung,et al.  Neural Lander: Stable Drone Landing Control Using Learned Dynamics , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[26]  Paulo Tabuada,et al.  Control Barrier Functions: Theory and Applications , 2019, 2019 18th European Control Conference (ECC).

[27]  Xin Huang,et al.  Fast Risk Assessment for Autonomous Vehicles Using Learned Models of Agent Futures , 2020, Robotics: Science and Systems.

[28]  Soon-Jo Chung,et al.  Robust Regression for Safe Exploration in Control , 2019, L4DC.

[29]  Sebastian Trimpe,et al.  Actively Learning Gaussian Process Dynamics , 2019, L4DC.

[30]  Nikolai Matni,et al.  On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.