Value function approximation and model predictive control

Both global methods and on-line trajectory optimization methods are powerful techniques for solving optimal control problems; however, each has limitations. In order to mitigate the undesirable properties of each, we explore the possibility of combining the two. We explore two methods of deriving a descriptive final cost function to assist model predictive control (MPC) in selecting a good policy without having to plan as far into the future or having to fine-tune delicate cost functions. First, we exploit the large amount of data which is generated in MPC simulations (based on the receding horizon iterative LQG method) to learn, off-line, the global optimal value function for use as a final cost. We demonstrate that, while the global function approximation matches the value function well on some problems, there is relatively little improvement to the original MPC. Alternatively, we solve the Bellman equation directly using aggregation methods for linearly-solvable Markov Decision Processes to obtain an approximation to the value function and the optimal policy. Using both pieces of information in the MPC framework, we find controller performance of similar quality to MPC alone with long horizon, but now we may drastically shorten the horizon. Implementation of these methods shows that Bellman equation-based methods and on-line trajectory methods can be combined in real applications to the benefit of both.

[1]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[2]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[3]  F. Allgöwer,et al.  A quasi-infinite horizon nonlinear model predictive control scheme with guaranteed stability , 1997 .

[4]  Jun Morimoto,et al.  Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach , 2002, NIPS.

[5]  Arno Linnemann,et al.  Toward infinite-horizon optimality in nonlinear model predictive control , 2002, IEEE Trans. Autom. Control..

[6]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[7]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[8]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[9]  William D. Smart,et al.  Receding Horizon Differential Dynamic Programming , 2007, NIPS.

[10]  Michael Fink,et al.  Online Learning of Search Heuristics , 2007, AISTATS.

[11]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[12]  Emanuel Todorov,et al.  Eigenfunction approximation methods for linearly-solvable optimal control problems , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[13]  Hans Joachim Ferreau,et al.  Efficient Numerical Methods for Nonlinear MPC and Moving Horizon Estimation , 2009 .

[14]  E. Todorov,et al.  Aggregation Methods for Lineary-Solvable Markov Decision Process , 2011 .

[15]  Yuval Tassa,et al.  Infinite-Horizon Model Predictive Control for Periodic Tasks with Contacts , 2011, Robotics: Science and Systems.

[16]  E. Todorov,et al.  Moving least-squares approximations for linearly-solvable stochastic optimal control problems , 2011 .

[17]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.