Comparison of Echo State Networks with Simple Recurrent Networks and Variable-Length Markov Models on Symbolic Sequences

A lot of attention is now being focused on connectionist models known under the name "reservoir computing". The most prominent example of these approaches is a recurrent neural network architecture called an echo state network (ESN). ESNs were successfully applied in more real-valued time series modeling tasks and performed exceptionally well. Also using ESNs for processing symbolic sequences seems to be attractive. In this work we experimentally support the claim that the state space of ESN is organized according to the Markovian architectural bias principles when processing symbolic sequences. We compare performance of ESNs with connectionist models explicitly using Markovian architectural bias property, with variable length Markov models and with recurrent neural networks trained by advanced training algorithms. Moreover we show that the number of reservoir units plays a similar role as the number of contexts in variable length Markov models.

[1]  Peter Bühlmann,et al.  Variable Length Markov Chains: Methodology, Computing, and Software , 2004 .

[2]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[3]  Herbert Jaeger,et al.  The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[4]  Ronald J. Williams,et al.  Training recurrent networks using the extended Kalman filter , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[5]  Janet Wiles,et al.  On learning context-free and context-sensitive languages , 2002, IEEE Trans. Neural Networks.

[6]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[7]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[8]  Stefan L. Frank,et al.  Learn more by training less: systematicity in sentence processing by recurrent networks , 2006, Connect. Sci..

[9]  K. Aberer,et al.  German National Research Center for Information Technology , 2007 .

[10]  Dana Ron,et al.  The power of amnesia: Learning probabilistic automata with variable memory length , 1996, Machine Learning.

[11]  Paul Rodríguez,et al.  Simple Recurrent Networks Learn Context-Free and Context-Sensitive Languages by Counting , 2001, Neural Computation.

[12]  Herbert Jaeger,et al.  Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.

[13]  Matthew W. Crocker,et al.  Recurrent Networks and Natural Language: Exploiting Self-organization , 2006 .

[14]  H. Jaeger,et al.  Reservoir riddles: suggestions for echo state network research , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[15]  P J Webros BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[16]  Peter Ti Markovian Architectural Bias of Recurrent Neural Networks , 2004 .

[17]  Dana Ron,et al.  The Power of Amnesia , 1993, NIPS.

[18]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[19]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[20]  Peter Tiño,et al.  Predicting the Future of Discrete Sequences from Fractal Representations of the Past , 2001, Machine Learning.

[21]  Peter Tiño,et al.  Recurrent Neural Networks with Iterated Function Systems Dynamics , 1998, NC.

[22]  D. Prokhorov,et al.  Echo state networks: appeal and challenges , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..