Using a Memory of Motion to Efficiently Warm-Start a Nonlinear Predictive Controller

Predictive control is an efficient model-based methodology to control complex dynamical systems. In general, it boils down to the resolution at each control cycle of a large nonlinear optimization problem. A critical issue is then to provide a good guess to initialize the nonlinear solver so as to speed up convergence. This is particularly important when disturbances or changes in the environment prevent the use of the trajectory computed at the previous control cycle as initial guess. In this paper, we introduce an original and very efficient solution to automatically build this initial guess. We propose to rely on off-line computation to build an approximation of the optimal trajectories, that can be used on-line to initialize the predictive controller. To that end, we combined the use of sampling-based planning, policy learning with generic representations (such as neural networks), and direct optimal control. We first propose an algorithm to simultaneously build a kinodynamic probabilistic roadmap (PRM) and approximate value function and control policy. This algorithm quickly converges toward an approximation of the optimal state-control trajectories (along with an optimal PRM). Then, we propose two methods to store the optimal trajectories and use them to initialize the predictive controller. We experimentally show that directly storing the state-control trajectories leads the predictive controller to quickly converges (2 to 5 iterations) toward the (global) optimal solution. The results are validated in simulation with an unmanned aerial vehicle (UAV) and other dynamical systems.

[1]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[2]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[3]  Moritz Diehl,et al.  ACADO toolkit—An open‐source framework for automatic control and dynamic optimization , 2011 .

[4]  Ross A. Knepper,et al.  DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[5]  Emanuel Todorov,et al.  Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[6]  Pieter Abbeel,et al.  Combining model-based policy search with online model learning for control of physical humanoids , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[7]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[8]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[9]  Pascal Fua,et al.  Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yuval Tassa,et al.  Value function approximation and model predictive control , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[11]  Wouter Caarls,et al.  Distance metric approximation for state-space RRTs using supervised learning , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Christopher G. Atkeson,et al.  Policies based on trajectory libraries , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[13]  Christopher G. Atkeson,et al.  Trajectory-Based Dynamic Programming , 2013 .

[14]  Sergey Levine,et al.  Learning from the hindsight plan — Episodic MPC improvement , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Zoran Popovic,et al.  Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[16]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..