Infinite-Horizon Model Predictive Control for Periodic Tasks with Contacts

We present a method that combines offline trajectory optimization and online Model Predictive Control (MPC), generating robust controllers for complex periodic behavior in domains with unilateral constraints (e.g., contact with the environment). MPC offers robust and adaptive control even in high-dimensional domains; however, the online optimization gets stuck in local minima when the domains has discontinuous dynamics. Some methods of trajectory optimization that are immune to such problems, but these are often too slow to be applied online. In this paper, we use offline optimization to find the limit-cycle solution of an infinite-horizon average-cost optimal-control task. We then compute a local quadratic approximation of the Value function around this limit cycle. Finally, we use this quadratic approximation as the terminal cost of an online MPC. This combination of an offline solution of the infinite-horizon problem with an online MPC controller is known as Infinite Horizon Model Predictive Control (IHMPC), and has previously been applied only to simple stabilization objectives. Here we extend IHMPC to tackle periodic tasks, and demonstrate the power of our approach by synthesizing hopping behavior in a simulated robot. IHMPC involves a limited computational load, and can be executed online on a standard laptop computer. The resulting behavior is extremely robust, allowing the hopper to recover from virtually any perturbation. In real robotic domains, modeling errors are inevitable. We show how IHMPC is robust to modeling errors by altering the morphology of the robot; the same controller remains effective, even when the underlying infinite-horizon solution is no longer accurate.

[1]  R. V. Gamkrelidze,et al.  Theory of Optimal Processes , 1961 .

[2]  Arthur E. Bryson,et al.  Applied Optimal Control , 1969 .

[3]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[4]  H. Bock,et al.  A Multiple Shooting Algorithm for Direct Solution of Optimal Control Problems , 1984 .

[5]  Andrew P. Witkin,et al.  Spacetime constraints , 1988, SIGGRAPH.

[6]  D. Bertsekas,et al.  Efficient dynamic programming implementations of Newton's method for unconstrained optimal control problems , 1989 .

[7]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[8]  Satinder P. Singh,et al.  Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[9]  F. Allgöwer,et al.  A quasi-infinite horizon nonlinear model predictive control scheme with guaranteed stability , 1997 .

[10]  Jay H. Lee,et al.  Model predictive control: past, present and future , 1999 .

[11]  David Q. Mayne,et al.  Constrained model predictive control: Stability and optimality , 2000, Autom..

[12]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[13]  Arno Linnemann,et al.  Toward infinite-horizon optimality in nonlinear model predictive control , 2002, IEEE Trans. Autom. Control..

[14]  Jun Morimoto,et al.  Minimax Differential Dynamic Programming: An Application to Robust Biped Walking , 2002, NIPS.

[15]  Stephen J. Wright,et al.  Existence and computation of infinite horizon model predictive control with active steady-state input constraints , 2003, IEEE Trans. Autom. Control..

[16]  Abhijit Gosavi,et al.  A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis , 2004, Machine Learning.

[17]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[18]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC , 2005, Eur. J. Control.

[19]  SRIDHAR MAHADEVAN,et al.  Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.

[20]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[21]  Fabio de Almeida,et al.  Waypoint Navigation Using Constrained Innite Horizon Model Predictive Control , 2008 .

[22]  Katja D. Mombaur,et al.  Using optimization to create self-stable human-like running , 2009, Robotica.

[23]  Emanuel Todorov,et al.  Eigenfunction approximation methods for linearly-solvable optimal control problems , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[24]  K. Wampler,et al.  Optimal gait and form for animal locomotion , 2009, SIGGRAPH 2009.

[25]  M. Diehl,et al.  Efficient Numerical Methods for Nonlinear MPC and Moving Horizon Estimation , 2009 .

[26]  Yuval Tassa,et al.  Stochastic Complementarity for Local Control of Discontinuous Dynamics , 2010, Robotics: Science and Systems.

[27]  Zoran Popovic,et al.  Terrain-adaptive bipedal locomotion control , 2010, ACM Transactions on Graphics.