A hierarchical learning architecture with multiple-goal representations based on adaptive dynamic programming

In this paper we propose a hierarchical learning architecture with multiple-goal representations based on adaptive dynamic programming (ADP). The key idea of this architecture is to integrate a reference network to provide the internal reinforcement representation (secondary reinforcement signal) to interact with the operation of the learning system. Such a reference network serves an important role to build the internal goal representations. Furthermore, motivated by recent research in neurobiological and psychology research, the proposed ADP architecture can be designed in a hierarchical way, in which different levels of internal reinforcement signals can be developed to represent multi-level goals for the intelligent system. Detailed system level architecture, learning and adaptation principle, and simulation results are presented in this work to demonstrate the effectiveness of this work.

[1]  Warren B. Powell,et al.  Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[2]  Jennie Si,et al.  Online learning control by association and reinforcement , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[3]  J. Hawkins,et al.  On Intelligence , 2004 .

[4]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[5]  P.J. Werbos,et al.  Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[6]  C.W. Anderson,et al.  Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[7]  L. Munari How the body shapes the way we think — a new view of intelligence , 2009 .

[8]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[9]  Rodney A. Brooks,et al.  Flesh and Machines: How Robots Will Change Us , 2002 .

[10]  Paul J. Werbos,et al.  2009 Special Issue: Intelligence in the brain: A theory of how it works and how to build it , 2009 .

[11]  J. Hawkins,et al.  Why Can't a Computer be more Like a Brain? , 2007, IEEE Spectrum.

[12]  Charles W. Anderson,et al.  Strategy Learning with Multilayer Connectionist Representations , 1987 .

[13]  Janusz A. Starzyk,et al.  Challenges of Embodied Intelligence , 2006 .

[14]  Paul J. Werbos,et al.  Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[16]  Robert Kozma,et al.  Beyond Feedforward Models Trained by Backpropagation: A Practical Training Tool for a More Efficient Universal Approximator , 2007, IEEE Transactions on Neural Networks.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.