Adaptive Learning in Continuous Environment Using Actor-Critic Design and Echo-State Networks

Approximating adaptive dynamic programming has been studied extensively in recent years for its potential scalability to solve problems involving continuous state and action spaces. The framework of adaptive critic design (ACD) addresses this issue and has been demonstrated in several case studies. The present paper proposes an implementation of ACD using an echo state network as the critic. The ESN is trained online to estimate the utility function and adapt the control policy of an embodied agent. In addition to its simple training algorithm, the ESN structure facilitates backpropagation of derivatives needed for adapting the controller. Experimental results using a mobile robot are provided to validate the proposed learning architecture.

[1]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[2]  Yuichi Nakamura,et al.  Approximation of dynamical systems by continuous time recurrent neural networks , 1993, Neural Networks.

[3]  Simon Haykin,et al.  Decoupled echo state networks with lateral inhibition , 2007, Neural Networks.

[4]  Petia Koprinkova-Hristova,et al.  Adaptive Critic Design with Echo State Network , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[5]  Frank L. Lewis,et al.  Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Gary B. Lamont,et al.  Applications Of Multi-Objective Evolutionary Algorithms , 2004 .

[7]  G. Isac Models and applications , 1992 .

[8]  Mohamed Oubbati Anticipating Rewards in Continuous Time and Space with Echo State Networks and Actor-Critic Design , 2011, ESANN.

[9]  Jochen J. Steil,et al.  Improving reservoirs using intrinsic plasticity , 2008, Neurocomputing.

[10]  Zhidong Deng,et al.  Collective Behavior of a Small-World Recurrent Neural System With Scale-Free Distribution , 2007, IEEE Transactions on Neural Networks.

[11]  Paul J. Werbos,et al.  Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.

[12]  R. Bellman Dynamic programming. , 1957, Science.

[13]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[14]  Gary B. Lamont,et al.  AN INTRODUCTION TO MULTI-OBJECTIVE EVOLUTIONARY ALGORITHMS AND THEIR APPLICATIONS , 2004 .

[15]  Richard S. Sutton,et al.  Neural networks for control , 1990 .

[16]  Peter Tiño,et al.  Minimum Complexity Echo State Network , 2011, IEEE Transactions on Neural Networks.

[17]  Donald A. Sofge,et al.  Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , 1992 .

[18]  Averill M. Law,et al.  The art and theory of dynamic programming , 1977 .

[19]  Frank L. Lewis,et al.  Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..

[20]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[21]  Herbert Jaeger,et al.  Reservoir computing approaches to recurrent neural network training , 2009, Comput. Sci. Rev..

[22]  Frank L. Lewis,et al.  2009 Special Issue: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems , 2009 .

[23]  Minoru Asada,et al.  Improving Recurrent Neural Network Performance Using Transfer Entropy , 2010, ICONIP.