论文信息 - Safety Guarantees for Planning Based on Iterative Gaussian Processes - 字舞流文

Safety Guarantees for Planning Based on Iterative Gaussian Processes

Gaussian Processes (GPs) are widely employed in control and learning because of their principled treatment of uncertainty. However, tracking uncertainty for iterative, multi-step predictions in general leads to an analytically intractable problem. While approximation methods exist, they do not come with guarantees, making it difficult to estimate their reliability and to trust their predictions. In this work, we derive formal probability error bounds for iterative prediction and planning with GPs. Building on GP properties, we bound the probability that random trajectories lie in specific regions around the predicted values. Namely, given a tolerance $\epsilon > 0 $, we compute regions around the predicted trajectory values, such that GP trajectories are guaranteed to lie inside them with probability at least $1-\epsilon$. We verify experimentally that our method tracks the predictive uncertainty correctly, even when current approximation techniques fail. Furthermore, we show how the proposed bounds can be employed within a safe reinforcement learning framework to verify the safety of candidate control policies, guiding the synthesis of provably safe controllers.

Luca Cardelli | Alessandro Abate | Stephen Roberts | Luca Laurenti | Kyriakos Polymenakos | Jan-Peter Calliess | Marta Kwiatkowska | Andrea Patane

[1] Stephen J. Roberts,et al. Safe Policy Search Using Gaussian Process Models , 2019, AAMAS.

[2] Luca Cardelli,et al. Adversarial Robustness Guarantees for Classification with Gaussian Processes , 2020, AISTATS.

[3] Duy Nguyen-Tuong,et al. Stability of Controllers for Gaussian Process Forward Models , 2016, ICML.

[4] Luca Cardelli,et al. Robustness Guarantees for Bayesian Inference with Gaussian Processes , 2019, AAAI.

[5] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[6] Carl E. Rasmussen,et al. Data-Efficient Reinforcement Learning in Continuous State-Action Gaussian-POMDPs , 2017, NIPS.

[7] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[8] C. Rasmussen,et al. Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .

[9] Alessandro Abate. Formal verification of complex systems: model-based and data-driven methods , 2017, MEMOCODE.

[10] Torsten Koller,et al. Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.

[11] Peter Englert,et al. Multi-task policy search for robotics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[12] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[13] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[14] Jan Peters,et al. Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.

[15] Tung-Long Vuong,et al. Uncertainty-aware Model-based Policy Optimization , 2019, ArXiv.

[16] R. Adler,et al. Random Fields and Geometry , 2007 .

[17] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[18] David J. N. Limebeer,et al. Linear Robust Control , 1994 .

[19] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .

[20] Luca Cardelli,et al. Robustness Quantification for Classification with Gaussian Processes , 2019, ArXiv.

[21] Carl E. Rasmussen,et al. Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[22] Marc Peter Deisenroth,et al. Efficient reinforcement learning using Gaussian processes , 2010 .

[23] C. Rasmussen,et al. Gaussian Process Priors with Uncertain Inputs - Application to Multiple-Step Ahead Time Series Forecasting , 2002, NIPS.

[24] Agathe Girard,et al. Propagation of uncertainty in Bayesian kernel models - application to multiple-step ahead forecasting , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[25] Jan Peters,et al. Numerical Quadrature for Probabilistic Policy Search , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Sergey Levine,et al. Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.