论文信息 - Approximate real-time optimal control based on sparse Gaussian process models

Approximate real-time optimal control based on sparse Gaussian process models

In this paper we present a fully automated approach to (approximate) optimal control of non-linear systems. Our algorithm jointly learns a non-parametric model of the system dynamics - based on Gaussian Process Regression (GPR) - and performs receding horizon control using an adapted iterative LQR formulation. This results in an extremely data-efficient learning algorithm that can operate under real-time constraints. When combined with an exploration strategy based on GPR variance, our algorithm successfully learns to control two benchmark problems in simulation (two-link manipulator, cart-pole) as well as to swing-up and balance a real cart-pole system. For all considered problems learning from scratch, that is without prior knowledge provided by an expert, succeeds in less than 10 episodes of interaction with the system.

[1] William D. Smart,et al. Receding Horizon Differential Dynamic Programming , 2007, NIPS.

[2] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[3] Carl E. Rasmussen,et al. Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[4] Marc Peter Deisenroth,et al. Efficient reinforcement learning using Gaussian processes , 2010 .

[5] Richard L. Lewis,et al. Variance-Based Rewards for Approximate Bayesian Reinforcement Learning , 2010, UAI.

[6] Pieter Abbeel,et al. Learning vehicular dynamics, with application to modeling helicopters , 2005, NIPS.

[7] Grace Wahba,et al. Spline Models for Observational Data , 1990 .

[8] Sethu Vijayakumar,et al. Optimal Control with Adaptive Internal Dynamics Models , 2008, ICINCO-ICSO.

[9] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[10] Pieter Abbeel,et al. Learning for control from multiple demonstrations , 2008, ICML '08.

[11] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[12] Nicholas Roy,et al. Real-Time Inverse Dynamics Learning for Musculoskeletal Robots Based on Echo State Gaussian Process Regression , 2013 .

[13] Raymond A. de Callafon,et al. Optimal trade-off between exploration and exploitation , 2008, 2008 American Control Conference.

[14] Carl E. Rasmussen,et al. Derivative Observations in Gaussian Process Models of Dynamic Systems , 2002, NIPS.

[15] Jan Peters,et al. Model Learning with Local Gaussian Process Regression , 2009, Adv. Robotics.

[16] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.

[17] F. Girosi,et al. Networks for approximation and learning , 1990, Proc. IEEE.

[18] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.

[19] Sergey Levine,et al. Learning Complex Neural Network Policies with Trajectory Optimization , 2014, ICML.

[20] Emanuel Todorov,et al. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[21] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.

[23] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[24] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[25] Jan Peters,et al. Model learning for robot control: a survey , 2011, Cognitive Processing.

[26] Sergey Levine,et al. Variational Policy Search via Trajectory Optimization , 2013, NIPS.

[27] Stefan Schaal,et al. Learning Control in Robotics , 2010, IEEE Robotics & Automation Magazine.

[28] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.