Trajectory-Based Dynamic Programming

We informally review our approach to using trajectory optimization to accelerate dynamic programming. Dynamic programming provides a way to design globally optimal control laws for nonlinear systems. However, the curse of dimensionality, the exponential dependence of memory and computation resources needed on the dimensionality of the state and control, limits the application of dynamic programming in practice. We explore trajectory-based dynamic programming, which combines many local optimizations to accelerate the global optimization of dynamic programming. We are able to solve problems with less resources than grid-based approaches, and to solve problems we couldn’t solve before using tabular or global function approximation approaches.

[1]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[2]  Christopher G. Atkeson,et al.  Nonparametric Model-Based Reinforcement Learning , 1997, NIPS.

[3]  Christopher G. Atkeson,et al.  Random Sampling of States in Dynamic Programming , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  R. Harrop An Introduction to the Method of Characteristics , 1953 .

[5]  M. Ciletti,et al.  The computation and theory of optimal control , 1972 .

[6]  Munther A. Dahleh,et al.  Maneuver-based motion planning for nonlinear systems with symmetries , 2005, IEEE Transactions on Robotics.

[7]  V. Rich Personal communication , 1989, Nature.

[8]  Frank L. Lewis,et al.  Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control , 2007, Autom..

[9]  梶田 尚志,et al.  IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'97) , 1998 .

[10]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[11]  Jessica K. Hodgins,et al.  Construction and optimal search of interpolated motion graphs , 2007, ACM Trans. Graph..

[12]  John N. Tsitsiklis,et al.  Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.

[13]  C. Atkeson,et al.  Minimax differential dynamic programming: application to a biped walking robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[14]  Robert E. Larson,et al.  State increment dynamic programming , 1968 .

[15]  Jun Morimoto,et al.  Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach , 2002, NIPS.

[16]  Christopher G. Atkeson,et al.  Multiple balance strategies from one optimization criterion , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[17]  David B. Doman,et al.  Integrated Adaptive Guidance and Control for Re-Entry Vehicles with Flight-Test Results , 2004 .

[18]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[20]  Benjamin Kuipers,et al.  Qualitative Hybrid Control of Dynamic Bipedal Walking , 2006, Robotics: Science and Systems.

[21]  Robert L. Grossman,et al.  Persistent stores and hybrid systems , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[22]  Jennie Si,et al.  Handbook of Learning and Approximate Dynamic Programming (IEEE Press Series on Computational Intelligence) , 2004 .

[23]  Mark B. Milam,et al.  A new computational approach to real-time trajectory generation for constrained mechanical systems , 2000, Proceedings of the 39th IEEE Conference on Decision and Control (Cat. No.00CH37187).

[24]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[25]  Frank L. Lewis,et al.  Optimal Control , 1986 .

[26]  Frank L. Lewis,et al.  Adaptive Critic Designs for Discrete-Time Zero-Sum Games With Application to $H_{\infty}$ Control , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[27]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[28]  Yuval Tassa,et al.  Iterative local dynamic programming , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[29]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[30]  Christopher G. Atkeson,et al.  Transfer of policies based on trajectory libraries , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Stefan Schaal,et al.  Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.

[32]  Russ Tedrake,et al.  LQR-trees: Feedback motion planning on sparse randomized trees , 2009, Robotics: Science and Systems.

[33]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[34]  C. Atkeson Randomly Sampling Actions In Dynamic Programming , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[35]  Jianbo Su,et al.  BipedWalking control using offline and online optimization , 2011, Proceedings of the 30th Chinese Control Conference.

[36]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[37]  P. Varaiya,et al.  Differential games , 1971 .

[38]  Russ Tedrake,et al.  Simulation-based LQR-trees with input and state constraints , 2010, 2010 IEEE International Conference on Robotics and Automation.

[39]  Benjamin J. Stephens Integral control of humanoid balance , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.