Heuristic dynamic programming with internal goal representation

In this paper, we analyze an internal goal structure based on heuristic dynamic programming, named GrHDP, to tackle the 2-D maze navigation problem. Classical reinforcement learning approaches have been introduced to solve this problem in literature, yet no intermediate reward has been assigned before reaching the final goal. In this paper, we integrated one additional network, namely goal network, into the traditional heuristic dynamic programming (HDP) design to provide the internal reward/goal representation. The architecture of our proposed approach is presented, followed by the simulation of 2-D maze navigation (10*10) problem. For fair comparison, we conduct the same simulation environment settings for the traditional HDP approach. Simulation results show that our proposed GrHDP can obtain faster convergent speed with respect to the sum of square error, and also achieve lower error eventually.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Donald C. Wunsch,et al.  The cellular simultaneous recurrent network adaptive critic design for the generalized maze problem has a simple closed-form solution , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[3]  Haibo He,et al.  Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[6]  P. J. Werbos,et al.  Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot , 1996, 1996 IEEE International Conference on Systems, Man and Cybernetics. Information Intelligence and Systems (Cat. No.96CH35929).

[7]  Qinglai Wei,et al.  Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming , 2012, Autom..

[8]  Haibo He,et al.  Real-time tracking on adaptive critic design with uniformly ultimately bounded condition , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[9]  P.J. Werbos,et al.  Efficient Learning in Cellular Simultaneous Recurrent Neural Networks - The Case of Maze Navigation Problem , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[10]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[11]  Roberto A. Santiago,et al.  Adaptive critic designs: A case study for neurocontrol , 1995, Neural Networks.

[12]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[13]  Jinyu Wen,et al.  Adaptive Learning in Tracking Control Based on the Dual Critic Network Design , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Feng Liu,et al.  A boundedness result for the direct heuristic dynamic programming , 2012, Neural Networks.

[15]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[16]  Robert Kozma,et al.  Cellular SRN Trained by Extended Kalman Filter Shows Promise for ADP , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[17]  Robert Kozma,et al.  Beyond Feedforward Models Trained by Backpropagation: A Practical Training Tool for a More Efficient Universal Approximator , 2007, IEEE Transactions on Neural Networks.

[18]  Haibo He,et al.  Adaptive dynamic programming with balanced weights seeking strategy , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[19]  M.A. Wiering,et al.  Two Novel On-policy Reinforcement Learning Algorithms based on TD(λ)-methods , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[20]  Donald C. Wunsch,et al.  Adaptive critic designs and their applications , 1997 .

[21]  Haibo He,et al.  An Adaptive Dynamic Programming Approach for Closely-Coupled MIMO System Control , 2011, ISNN.

[22]  Lei Yang,et al.  Direct Heuristic Dynamic Programming for Nonlinear Tracking Control With Filtered Tracking Error , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Haibo He,et al.  A three-network architecture for on-line learning and optimization based on adaptive dynamic programming , 2012, Neurocomputing.

[24]  Jennie Si,et al.  Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[25]  Haibo He Self-Adaptive Systems for Machine Intelligence: He/Machine Intelligence , 2011 .

[26]  Paul J. Werbos,et al.  Foreword: ADP - The Key Direction for Future Research in Intelligent Control and Understanding Brain Intelligence , 2008, IEEE Trans. Syst. Man Cybern. Part B.

[27]  Haibo He Self-Adaptive Systems for Machine Intelligence , 2011 .

[28]  X. Pang,et al.  Neural network design for J function approximation in dynamic programming , 1998, adap-org/9806001.

[29]  Dongbin Zhao,et al.  Data-driven learning and control with multiple critic networks , 2012, Proceedings of the 10th World Congress on Intelligent Control and Automation.

[30]  Haibo He,et al.  An online actor-critic learning approach with Levenberg-Marquardt algorithm , 2011, The 2011 International Joint Conference on Neural Networks.

[31]  Paul J. Werbos,et al.  2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it , 2009 .

[32]  Frank L. Lewis,et al.  Learning and Optimization in Hierarchical Adaptive Critic Design , 2013 .

[33]  Haibo He,et al.  Adaptive Learning and Control for MIMO System Based on Adaptive Dynamic Programming , 2011, IEEE Transactions on Neural Networks.

[34]  Derong Liu,et al.  Neural-Network-Based Optimal Control for a Class of Unknown Discrete-Time Nonlinear Systems Using Globalized Dual Heuristic Programming , 2012, IEEE Transactions on Automation Science and Engineering.

[35]  Derong Liu,et al.  Finite-Approximation-Error-Based Optimal Control Approach for Discrete-Time Nonlinear Systems , 2013, IEEE Transactions on Cybernetics.

[36]  Haibo He,et al.  Goal Representation Heuristic Dynamic Programming on Maze Navigation , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Zhen Ni,et al.  Learning and control in virtual reality for machine intelligence , 2012, 2012 Third International Conference on Intelligent Control and Information Processing.