Receding Horizon Differential Dynamic Programming

The control of high-dimensional, continuous, non-linear dynamical systems is a key problem in reinforcement learning and control. Local, trajectory-based methods, using techniques such as Differential Dynamic Programming (DDP), are not directly subject to the curse of dimensionality, but generate only local controllers. In this paper, we introduce Receding Horizon DDP (RH-DDP), an extension to the classic DDP algorithm, which allows us to construct stable and robust controllers based on a library of local-control trajectories. We demonstrate the effectiveness of our approach on a series of high-dimensional problems using a simulated multi-link swimming robot. These experiments show that our approach effectively circumvents dimensionality issues, and is capable of dealing with problems of (at least) 24 state and 9 action dimensions.

[1]  D. Mayne A Second-order Gradient Method for Determining Optimal Trajectories of Non-linear Discrete-time Systems , 1966 .

[2]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[3]  Manfred Morari,et al.  Model predictive control: Theory and practice - A survey , 1989, Autom..

[4]  Sidney Yakowitz,et al.  Algorithms and Computational Techniques in Differential Dynamic Programming , 1989 .

[5]  L. Liao,et al.  Convergence in unconstrained discrete-time differential dynamic programming , 1991 .

[6]  L. Liao,et al.  Advantages of Differential Dynamic Programming Over Newton''s Method for Discrete-time Optimal Control Problems , 1992 .

[7]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[8]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[9]  Andrew W. Moore,et al.  Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.

[10]  Jun Morimoto,et al.  Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach , 2002, NIPS.

[11]  Rémi Coulom,et al.  Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .

[12]  Jun Morimoto,et al.  Minimax Differential Dynamic Programming: An Application to Robust Biped Walking , 2002, NIPS.

[13]  Emanuel Todorov,et al.  Optimal control methods suitable for biomechanical systems , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[14]  Garth Zeglin,et al.  Dynamic Programming in Reduced Dimensional Spaces: Dynamic Planning For Robust Biped Locomotion , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[15]  Rémi Munos,et al.  Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..

[16]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[17]  Christopher G. Atkeson,et al.  Policies based on trajectory libraries , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[18]  Stefan Schaal,et al.  Reinforcement Learning for Parameterized Motor Primitives , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[19]  Auke Jan Ijspeert,et al.  AmphiBot II: An Amphibious Snake Robot that Crawls and Swims using a Central Pattern Generator , 2006 .

[20]  William D. Smart,et al.  Bipedal walking on rough terrain using manifold control , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Yuval Tassa,et al.  Iterative local dynamic programming , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.