Adaptive Learning in Tracking Control Based on the Dual Critic Network Design

In this paper, we present a new adaptive dynamic programming approach by integrating a reference network that provides an internal goal representation to help the systems learning and optimization. Specifically, we build the reference network on top of the critic network to form a dual critic network design that contains the detailed internal goal representation to help approximate the value function. This internal goal signal, working as the reinforcement signal for the critic network in our design, is adaptively generated by the reference network and can also be adjusted automatically. In this way, we provide an alternative choice rather than crafting the reinforcement signal manually from prior knowledge. In this paper, we adopt the online action-dependent heuristic dynamic programming (ADHDP) design and provide the detailed design of the dual critic network structure. Detailed Lyapunov stability analysis for our proposed approach is presented to support the proposed structure from a theoretical point of view. Furthermore, we also develop a virtual reality platform to demonstrate the real-time simulation of our approach under different disturbance situations. The overall adaptive learning performance has been tested on two tracking control benchmarks with a tracking filter. For comparative studies, we also present the tracking performance with the typical ADHDP, and the simulation results justify the improved performance with our approach.

[1]  Haibo He,et al.  An Adaptive Dynamic Programming Approach for Closely-Coupled MIMO System Control , 2011, ISNN.

[2]  Haibo He,et al.  A three-network architecture for on-line learning and optimization based on adaptive dynamic programming , 2012, Neurocomputing.

[3]  Donald C. Wunsch,et al.  Neurocontroller alternatives for "fuzzy" ball-and-beam systems with nonuniform nonlinear friction , 2000, IEEE Trans. Neural Networks Learn. Syst..

[4]  Haibo He,et al.  Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[5]  George G. Lendaris,et al.  Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[6]  Xiaoli Ma,et al.  Adaptive state feedback and tracking control of systems with actuator failures , 2001, IEEE Trans. Autom. Control..

[7]  Haibo He,et al.  An online actor-critic learning approach with Levenberg-Marquardt algorithm , 2011, The 2011 International Joint Conference on Neural Networks.

[8]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[9]  Haibo He,et al.  Reactive power control of grid-connected wind farm based on adaptive dynamic programming , 2014, Neurocomputing.

[10]  Lei Yang,et al.  Direct Heuristic Dynamic Programming for Nonlinear Tracking Control With Filtered Tracking Error , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Haibo He Self-Adaptive Systems for Machine Intelligence: He/Machine Intelligence , 2011 .

[12]  Jennie Si,et al.  Online learning control by association and reinforcement. , 2001, IEEE transactions on neural networks.

[13]  D. Liu,et al.  Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems With $\varepsilon$-Error Bound , 2011, IEEE Transactions on Neural Networks.

[14]  Paul J. Werbos,et al.  Foreword: ADP - The Key Direction for Future Research in Intelligent Control and Understanding Brain Intelligence , 2008, IEEE Trans. Syst. Man Cybern. Part B.

[15]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[16]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[17]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Jong Min Lee,et al.  Approximate Dynamic Programming Strategies and Their Applicability for Process Control: A Review and Future Directions , 2004 .

[19]  Jennie Si,et al.  Helicopter trimming and tracking control using direct neural dynamic programming , 2003, IEEE Trans. Neural Networks.

[20]  Michael Fairbank,et al.  Simple and Fast Calculation of the Second-Order Gradients for Globalized Dual Heuristic Dynamic Programming in Neural Networks , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Warren B. Powell,et al.  GUIDANCE IN THE USE OF ADAPTIVE CRITICS FOR CONTROL , 2007 .

[22]  Derong Liu,et al.  Adaptive Critic Learning Techniques for Engine Torque and Air–Fuel Ratio Control , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[24]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[25]  Huaguang Zhang,et al.  Nonlinear adaptive control using the Fourier integral and its application to CSTR systems , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[26]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[27]  Paul J. Werbos,et al.  2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it , 2009 .

[28]  Yi Zhang,et al.  A self-learning call admission control scheme for CDMA cellular networks , 2005, IEEE Transactions on Neural Networks.

[29]  Xin Zhang,et al.  Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method , 2011, IEEE Transactions on Neural Networks.

[30]  Haibo He Self-Adaptive Systems for Machine Intelligence , 2011 .

[31]  Vikram Kapila,et al.  Adaptive tracking control using synthesized velocity from attitude measurements , 2001, Autom..

[32]  Chung-Cheng Chen,et al.  Stability and Almost Disturbance Decoupling Analysis of Nonlinear System Subject to Feedback Linearization and Feedforward Neural Network Controller , 2008, IEEE Transactions on Neural Networks.

[33]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[34]  Huaguang Zhang,et al.  Optimal Tracking Control for a Class of Nonlinear Discrete-Time Systems With Time Delays Based on Heuristic Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[35]  Dongbin Zhao,et al.  Data-driven learning and control with multiple critic networks , 2012, Proceedings of the 10th World Congress on Intelligent Control and Automation.

[36]  Frank L. Lewis,et al.  Learning and Optimization in Hierarchical Adaptive Critic Design , 2013 .

[37]  Haibo He,et al.  Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[38]  Feng Liu,et al.  A boundedness result for the direct heuristic dynamic programming , 2012, Neural Networks.

[39]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[40]  Frank L. Lewis,et al.  Neural Network Control Of Robot Manipulators And Non-Linear Systems , 1998 .

[41]  Zhen Ni,et al.  Learning and control in virtual reality for machine intelligence , 2012, 2012 Third International Conference on Intelligent Control and Information Processing.