Direct heuristic dynamic programming based on an improved PID neural network

In this paper, an improved PID-neural network (IPIDNN) structure is proposed and applied to the critic and action networks of direct heuristic dynamic programming (DHDP). As one of online learning algorithm of approximate dynamic programming (ADP), DHDP has demonstrated its applicability to large state and control problems. Theoretically, the DHDP algorithm requires access to full state feedback in order to obtain solutions to the Bellman optimality equation. Unfortunately, it is not always possible to access all the states in a real system. This paper proposes a solution by suggesting an IPIDNN configuration to construct the critic and action networks to achieve an output feedback control. Since this structure can estimate the integrals and derivatives of measurable outputs, more system states are utilized and thus better control performance are expected. Compared with traditional PIDNN, this configuration is flexible and easy to expand. Based on this structure, a gradient decent algorithm for this IPIDNN-based DHDP is presented. Convergence issues are addressed within a single learning time step and for the entire learning process. Some important insights are provided to guide the implementation of the algorithm. The proposed learning controller has been applied to a cart-pole system to validate the effectiveness of the structure and the algorithm.

[1]  Donald E. Kirk,et al.  Optimal control theory : an introduction , 1970 .

[2]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[4]  Jennie Si,et al.  Apache Helicopter Stabilization Using Neural Dynamic Programming , 2002 .

[5]  James E. Steck,et al.  Adaptive Feedback Control by Constrained Approximate Dynamic Programming , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[7]  Jennie Si,et al.  Online learning control by association and reinforcement. , 2001, IEEE transactions on neural networks.

[8]  Stuart E. Dreyfus,et al.  Applied Dynamic Programming , 1965 .

[9]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[10]  S. N. Balakrishnan,et al.  A neighboring optimal adaptive critic for missile guidance , 1996 .

[11]  R. L. Larsen Functional analysis;: An introduction , 1973 .

[12]  H. Robbins A Stochastic Approximation Method , 1951 .

[13]  Jennie Si,et al.  Helicopter trimming and tracking control using direct neural dynamic programming , 2003, IEEE Trans. Neural Networks.

[14]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  Mojtaba Rahmati,et al.  A Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy , 2013 .

[16]  Youguo Pi,et al.  PID neural networks for time-delay systems , 2000 .

[17]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[18]  Lei Yang Understanding and analyzing approximate dynamic programming with gradient-based framework and direct heuristic dynamic programming. , 2011 .