Modeling reward functions for incomplete state representations via echo state networks

This paper investigates an echo state network (ESN) (Jaeger, 2001 and Maass and Markram, 2002) architecture as the approximation of the Q-function for temporally dependent rewards embedded in a linear dynamical system, the mass-spring-damper (MSD). This problem has been solved utilizing feed-forward neural networks (FNN) when all state information necessary to specify the dynamics is provided as input (Kretchmar, 2000). Time-delayed neural networks (TDNN) solve this problem with finite-size windows of incomplete state information. Our research demonstrates that the ESN architecture represents the Q-function of the MSD system given incomplete state information as well as current feed forward neural networks given either perfect state or a temporally-windowed, incomplete state vector. The remainder of this paper is organized as follows. We introduce basic concepts of reinforcement learning and the echo state network architecture. The MSD system simulation is defined in section IV. Experimental results for learning state quality given incomplete state information are presented in section V. Results for learning estimates of all future state qualities for incomplete state information is presented in section VI. Section VII discusses the potential of the ESN for use in reinforcement learning and provides current and future directions of research.

[1]  A. P. Wieland,et al.  Evolving neural network controllers for unstable systems , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[2]  P J Webros BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[3]  Herbert Jaeger,et al.  The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[4]  Herbert Jaeger,et al.  A tutorial on training recurrent neural networks , covering BPPT , RTRL , EKF and the " echo state network " approach - Semantic Scholar , 2005 .

[5]  Stuart E. Dreyfus,et al.  On using discretized Cohen-Grossberg node dynamics for model-free actor-critic neural learning in non-Markovian domains , 2003, Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No.03EX694).

[6]  Jennie Si,et al.  Backpropagation Through Time and Derivative Adaptive CriticsA Common Framework for ComparisonPortions of this chapter were previously published in [4, 7,9, 1214,23]. , 2004 .

[7]  Eiji Mizutani,et al.  Two stochastic dynamic programming problems by model-free actor-critic recurrent-network learning in non-Markovian settings , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[8]  Steven Seidman,et al.  A synthesis of reinforcement learning and robust control theory , 2000 .

[9]  Paul-Gerhard Plöger,et al.  Echo State Networks for Mobile Robot Modeling and Control , 2003, RoboCup.

[10]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Jilles Vreeken,et al.  On real-world temporal pattern recognition using Liquid State Machines , 2003 .