Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming

We are interested in developing a multi-goal generator to provide detailed goal representations that help to improve the performance of the adaptive critic design (ACD). In this paper we propose a hierarchical structure of goal generator networks to cascade external reinforcement into more informative internal goal representations in the ACD. This is in contrast with previous designs in which the external reward signal is assigned to the critic network directly. The ACD control system performance is evaluated on the ball-and-beam balancing benchmark under noise-free and various noisy conditions. Simulation results in the form of a comparative study demonstrate effectiveness of our approach.

[1]  Roberto A. Santiago,et al.  Adaptive critic designs: A case study for neurocontrol , 1995, Neural Networks.

[2]  P.J. Werbos,et al.  Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[3]  Haibo He,et al.  A hierarchical learning architecture with multiple-goal representations based on adaptive dynamic programming , 2010, 2010 International Conference on Networking, Sensing and Control (ICNSC).

[4]  Donald C. Wunsch,et al.  Neurocontroller alternatives for "fuzzy" ball-and-beam systems with nonuniform nonlinear friction , 2000, IEEE Trans. Neural Networks Learn. Syst..

[5]  Paul J. Werbos,et al.  2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it , 2009 .

[6]  D. Liu,et al.  Adaptive Dynamic Programming for Finite-Horizon Optimal Control of Discrete-Time Nonlinear Systems With $\varepsilon$-Error Bound , 2011, IEEE Transactions on Neural Networks.

[7]  Huaguang Zhang,et al.  An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games , 2011, Autom..

[8]  Haibo He,et al.  Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[9]  Jennie Si,et al.  Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[10]  Huaguang Zhang,et al.  Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems With Control Constraints , 2009, IEEE Transactions on Neural Networks.

[11]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[12]  Huaguang Zhang,et al.  Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.

[13]  Chung-Cheng Chen,et al.  Stability and Almost Disturbance Decoupling Analysis of Nonlinear System Subject to Feedback Linearization and Feedforward Neural Network Controller , 2008, IEEE Transactions on Neural Networks.

[14]  Huaguang Zhang,et al.  A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[16]  F.L. Lewis,et al.  Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.

[17]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[18]  Haibo He,et al.  A three-network architecture for on-line learning and optimization based on adaptive dynamic programming , 2012, Neurocomputing.

[19]  Lei Yang,et al.  Direct Heuristic Dynamic Programming for Nonlinear Tracking Control With Filtered Tracking Error , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Donald C. Wunsch,et al.  Adaptive critic designs and their applications , 1997 .