Adaptive Probabilistic Trajectory Optimization via Efficient Approximate Inference

Robotic systems must be able to quickly and robustly make decisions when operating in uncertain and dynamic environments. While Reinforcement Learning (RL) can be used to compute optimal policies with little prior knowledge about the environment, it suffers from slow convergence. An alternative approach is Model Predictive Control (MPC), which optimizes policies quickly, but also requires accurate models of the system dynamics and environment. In this paper we propose a new approach, adaptive probabilistic trajectory optimization, that combines the benefits of RL and MPC. Our method uses scalable approximate inference to learn and updates probabilistic models in an online incremental fashion while also computing optimal control policies via successive local approximations. We present two variations of our algorithm based on the Sparse Spectrum Gaussian Process (SSGP) model, and we test our algorithm on three learning tasks, demonstrating the effectiveness and efficiency of our approach.

[1]  Giorgio Metta,et al.  Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression. , 2013, Neural networks : the official journal of the International Neural Network Society.

[2]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[3]  W. Rudin,et al.  Fourier Analysis on Groups. , 1965 .

[4]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[5]  William D. Smart,et al.  Receding Horizon Differential Dynamic Programming , 2007, NIPS.

[6]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[8]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[9]  Yunpeng Pan,et al.  Probabilistic Differential Dynamic Programming , 2014, NIPS.

[10]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[11]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[12]  Martin A. Riedmiller,et al.  Approximate real-time optimal control based on sparse Gaussian process models , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[13]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[14]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[15]  Yuval Tassa,et al.  Control-limited differential dynamic programming , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Sethu Vijayakumar,et al.  Adaptive Optimal Feedback Control with Learned Internal Dynamics Models , 2010, From Motor Learning to Interaction Learning in Robots.

[17]  Jun Morimoto,et al.  Minimax Differential Dynamic Programming: An Application to Robust Biped Walking , 2002, NIPS.

[18]  Emilio Frazzoli,et al.  Steady-state cornering equilibria and stabilisation for a vehicle during extreme operating conditions , 2010 .

[19]  Ai Poh Loh,et al.  Model-based contextual policy search for data-efficient generalization of robot skills , 2017, Artif. Intell..

[20]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[21]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[22]  C. Rasmussen,et al.  Gaussian Process Priors with Uncertain Inputs - Application to Multiple-Step Ahead Time Series Forecasting , 2002, NIPS.

[23]  Agathe Girard,et al.  Propagation of uncertainty in Bayesian kernel models - application to multiple-step ahead forecasting , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[24]  Christopher G. Atkeson,et al.  A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.