Direct heuristic dynamic programming with augmented states

This paper addresses a design issue of an approximate dynamic programming structure and its respective convergence property. Specifically, we propose to impose a PID structure to the action and critic networks in the direct heuristic dynamic programming (direct HDP) online learning controller. We demonstrate that the direct HDP with such PID augmented states improves convergence speed and that it out performs the traditional PID even though the learning controller may be initialized to be like a PID. Also for the first time, by using a Lyapnov approach we show that the action and critic network weights retain the property of uniformly ultimate boundedness (UUB) under mild conditions.

[1]  Roberto A. Santiago,et al.  Adaptive critic designs: A case study for neurocontrol , 1995, Neural Networks.

[2]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  Lei Yang,et al.  Performance Evaluation of Direct Heuristic Dynamic Programming using Control-Theoretic Measures , 2009, J. Intell. Robotic Syst..

[5]  Andrew G. Barto,et al.  Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.

[6]  P. B. Coaker,et al.  Applied Dynamic Programming , 1964 .

[7]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[8]  Jennie Si,et al.  Online learning control by association and reinforcement. , 2001, IEEE transactions on neural networks.

[9]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[10]  Paul J. Werbos,et al.  Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[11]  Tomas Landelius,et al.  Reinforcement Learning and Distributed Local Model Synthesis , 1997 .

[12]  Jennie Si,et al.  Apache Helicopter Stabilization Using Neural Dynamic Programming , 2002 .

[13]  Jennie Si,et al.  Helicopter trimming and tracking control using direct neural dynamic programming , 2003, IEEE Trans. Neural Networks.

[14]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[15]  S. N. Balakrishnan,et al.  A neighboring optimal adaptive critic for missile guidance , 1996 .

[16]  Lei Yang,et al.  Direct Heuristic Dynamic Programming for Nonlinear Tracking Control With Filtered Tracking Error , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  D. Ernst,et al.  Power systems stability control: reinforcement learning framework , 2004, IEEE Transactions on Power Systems.

[18]  Youguo Pi,et al.  PID neural networks for time-delay systems , 2000 .

[19]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[20]  Yoh-Han Pao,et al.  Stochastic choice of basis functions in adaptive function approximation and the functional-link net , 1995, IEEE Trans. Neural Networks.