Safe Policy Search Using Gaussian Process Models

We propose a method to optimise the parameters of a policy which will be used to safely perform a given task in a data-efficient manner. We train a Gaussian process model to capture the system dynamics, based on the PILCO framework. The model has useful analytic properties, which allow closed form computation of error gradients and the probability of violating given state space constraints. Even during training, only policies that are deemed safe are implemented on the real system, minimising the risk of catastrophic failure.

[1]  Agathe Girard,et al.  Propagation of uncertainty in Bayesian kernel models - application to multiple-step ahead forecasting , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[3]  Benjamin Rosman,et al.  Online Constrained Model-based Reinforcement Learning , 2017, UAI.

[4]  Claire J. Tomlin,et al.  Guaranteed Safe Online Learning via Reachability: tracking a ground target using a quadrotor , 2012, 2012 IEEE International Conference on Robotics and Automation.

[5]  Andreas Krause,et al.  Safe controller optimization for quadrotors with Gaussian processes , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[7]  Patrick Doherty,et al.  Model-Based Reinforcement Learning in Continuous Environments Using Real-Time Constrained Optimization , 2015, AAAI.

[8]  Andreas Krause,et al.  Constrained Bayesian Optimization with Particle Swarms for Safe Adaptive Controller Tuning , 2017 .

[9]  Ron Alterovitz,et al.  Safe Motion Planning for Imprecise Robotic Manipulators by Minimizing Probability of Collision , 2013, ISRR.

[10]  Sebastian Junges,et al.  Safety-Constrained Reinforcement Learning for MDPs , 2015, TACAS.

[11]  James Hensman,et al.  Identification of Gaussian Process State Space Models , 2017, NIPS.

[12]  Sergey Levine,et al.  Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.

[13]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[14]  Christel Baier,et al.  Principles of model checking , 2008 .

[15]  Sofie Haesaert,et al.  Data-Efficient Bayesian Verification of Parametric Markov Chains , 2016, QEST.

[16]  Marc Peter Deisenroth,et al.  Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.

[17]  Leslie Pack Kaelbling,et al.  Provably safe robot navigation with obstacle uncertainty , 2017, Robotics: Science and Systems.

[18]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[21]  Jaime F. Fisac,et al.  Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[22]  Marco Pavone,et al.  Monte Carlo Motion Planning for Robot Trajectory Optimization Under Uncertainty , 2015, ISRR.

[23]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[24]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[25]  C. Rasmussen,et al.  Improving PILCO with Bayesian Neural Network Dynamics Models , 2016 .

[26]  Marc Peter Deisenroth,et al.  Efficient reinforcement learning using Gaussian processes , 2010 .