Differential Dynamic Programming for time-delayed systems

Trajectory optimization considers the problem of deciding how to control a dynamical system to move along a trajectory which minimizes some cost function. Differential Dynamic Programming (DDP) is an optimal control method which utilizes a second-order approximation of the problem to find the control. It is fast enough to allow real-time control and has been shown to work well for trajectory optimization in robotic systems. Here we extend classic DDP to systems with multiple time-delays in the state. Being able to find optimal trajectories for time-delayed systems with DDP opens up the possibility to use richer models for system identification and control, including recurrent neural networks with multiple timesteps in the state. We demonstrate the algorithm on a two-tank continuous stirred tank reactor. We also demonstrate the algorithm on a recurrent neural network trained to model an inverted pendulum with position information only.

[1]  Thomas B. Schön,et al.  From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.

[2]  Yuichi Nakamura,et al.  Approximation of dynamical systems by continuous time recurrent neural networks , 1993, Neural Networks.

[3]  D. Mayne A Second-order Gradient Method for Determining Optimal Trajectories of Non-linear Discrete-time Systems , 1966 .

[4]  Yuval Tassa,et al.  Control-limited differential dynamic programming , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[6]  SchmidhuberJürgen Deep learning in neural networks , 2015 .

[7]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[8]  Yuval Tassa,et al.  Stochastic Differential Dynamic Programming , 2010, Proceedings of the 2010 American Control Conference.

[9]  Yunpeng Pan,et al.  Probabilistic Differential Dynamic Programming , 2014, NIPS.

[10]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[11]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Frank Dellaert,et al.  Differential dynamic programming for optimal estimation , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Ross A. Knepper,et al.  DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[15]  Minoru Asada,et al.  Real-Time Inverse Dynamics Learning for Musculoskeletal Robots based on Echo State Gaussian Process Regression , 2012, Robotics: Science and Systems.

[16]  Geoffrey E. Hinton,et al.  Temporal-Kernel Recurrent Neural Networks , 2010, Neural Networks.

[17]  V. G. Boltyanskiy The Maximum Principle in the Theory of Optimal Processes. , 1961 .

[18]  Rein Luus,et al.  Optimal control of time-delay systems by dynamic programming , 1992 .

[19]  C. Hwang,et al.  An improved computational scheme for solving dynamic optimization problems with iterative dynamic programming , 1999 .

[20]  Martial Hebert,et al.  Improving Multi-Step Prediction of Learned Time Series Models , 2015, AAAI.

[21]  William D. Smart,et al.  Receding Horizon Differential Dynamic Programming , 2007, NIPS.

[22]  Eduardo Morgado Belo,et al.  Application of time-delay neural and recurrent neural networks for the identification of a hingeless helicopter blade flapping and torsion motions , 2005 .

[23]  Moritz Diehl,et al.  ACADO toolkit—An open‐source framework for automatic control and dynamic optimization , 2011 .

[24]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.