Model Predictive Path Integral Control using Covariance Variable Importance Sampling

In this paper we develop a Model Predictive Path Integral (MPPI) control algorithm based on a generalized importance sampling scheme and perform parallel optimization via sampling using a Graphics Processing Unit (GPU). The proposed generalized importance sampling scheme allows for changes in the drift and diffusion terms of stochastic diffusion processes and plays a significant role in the performance of the model predictive control algorithm. We compare the proposed algorithm in simulation with a model predictive control version of differential dynamic programming.

[1]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[2]  A. Friedman Stochastic Differential Equations and Applications , 1975 .

[3]  Ioannis Karatzas,et al.  Brownian Motion and Stochastic Calculus , 1987 .

[4]  W. Fleming,et al.  Controlled Markov processes and viscosity solutions , 1992 .

[5]  M. James Controlled markov processes and viscosity solutions , 1994 .

[6]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[7]  H. Kappen Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[8]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[9]  H. Kappen An introduction to stochastic control theory, path integrals and reinforcement learning , 2007 .

[10]  Vijay Kumar,et al.  The GRASP Multiple Micro-UAV Testbed , 2010, IEEE Robotics & Automation Magazine.

[11]  Yuval Tassa,et al.  Stochastic Differential Dynamic Programming , 2010, Proceedings of the 2010 American Control Conference.

[12]  Stefan Schaal,et al.  Reinforcement learning of full-body humanoid motor skills , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[13]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[14]  Pieter Abbeel,et al.  LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information , 2010, Int. J. Robotics Res..

[15]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Evangelos Theodorou,et al.  Relative entropy and free energy dualities: Connections to Path Integral and KL control , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[17]  Rami Yusef Hindiyeh,et al.  Dynamics and control of drifting in automobiles , 2013 .

[18]  Y. Matsuoka,et al.  Reinforcement Learning and Synergistic Control of the ACT Hand , 2013, IEEE/ASME Transactions on Mechatronics.

[19]  Vicenç Gómez,et al.  Policy Search for Path Integral Control , 2014, ECML/PKDD.

[20]  Yunpeng Pan,et al.  Probabilistic Differential Dynamic Programming , 2014, NIPS.

[21]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[22]  Eric Rombokas,et al.  GPU Based Path Integral Control with Learned Dynamics , 2015, ArXiv.

[23]  H. Kappen,et al.  Path integral control and state-dependent feedback. , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Evangelos Theodorou,et al.  Nonlinear Stochastic Control and Information Theoretic Dualities: Connections, Interdependencies and Thermodynamic Interpretations , 2015, Entropy.

[25]  Vicenç Gómez,et al.  Real-Time Stochastic Optimal Control for Multi-Agent Quadrotor Systems , 2015, ICAPS.