Echo State Networks (ESNs) have been shown to be effective for a number of tasks, including motor control, dynamic time series prediction, and memorizing musical sequences. However, their performance on natural language tasks has been largely unexplored until now. Simple Recurrent Networks (SRNs) have a long history in language modeling and show a striking similarity in architecture to ESNs. A comparison of SRNs and ESNs on a natural language task is therefore a natural choice for experimentation. Elman applies SRNs to a standard task in statistical NLP: predicting the next word in a corpus, given the previous words. Using a simple context-free grammar and an SRN with backpropagation through time (BPTT), Elman showed that the network was able to learn internal representations that were sensitive to linguistic processes that were useful for the prediction task. Here, using ESNs, we show that training such internal representations is unnecessary to achieve levels of performance comparable to SRNs. We also compare the processing capabilities of ESNs to bigrams and trigrams. Due to some unexpected regularities of Elman's grammar, these statistical techniques are capable of maintaining dependencies over greater distances than might be initially expected. However, we show that the memory of ESNs in this word-prediction task, although noisy, extends significantly beyond that of bigrams and trigrams, enabling ESNs to make good predictions of verb agreement at distances over which these methods operate at chance. Overall, our results indicate a surprising ability of ESNs to learn a grammar, suggesting that they form useful internal representations without learning them.
[1]
Stefan L. Frank,et al.
Learn more by training less: systematicity in sentence processing by recurrent networks
,
2006,
Connect. Sci..
[2]
J. Elman.
Distributed Representations, Simple Recurrent Networks, And Grammatical Structure
,
1991
.
[3]
Paul-Gerhard Plöger,et al.
Echo State Networks used for Motor Control
,
2005,
Proceedings of the 2005 IEEE International Conference on Robotics and Automation.
[4]
J. Elman.
Learning and development in neural networks: the importance of starting small
,
1993,
Cognition.
[5]
Stefan L. Frank.
Strong Systematicity in Sentence Processing by an Echo State Network
,
2006,
ICANN.
[6]
Herbert Jaeger,et al.
The''echo state''approach to analysing and training recurrent neural networks
,
2001
.
[7]
Herbert Jaeger,et al.
A tutorial on training recurrent neural networks , covering BPPT , RTRL , EKF and the " echo state network " approach - Semantic Scholar
,
2005
.
[8]
Henry Markram,et al.
Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations
,
2002,
Neural Computation.