Reservoir Stack Machines

Memory-augmented neural networks equip a recurrent neural network with an explicit memory to support tasks that require information storage without interference over long times. A key motivation for such research is to perform classic computation tasks, such as parsing. However, memory-augmented neural networks are notoriously hard to train, requiring many backpropagation epochs and a lot of data. In this paper, we introduce the reservoir stack machine, a model which can provably recognize all deterministic contextfree languages and circumvents the training problem by training only the output layer of a recurrent net and employing auxiliary information during training about the desired interaction with a stack. In our experiments, we validate the reservoir stack machine against deep and shallow networks from the literature on three benchmark tasks for Neural Turing machines and six deterministic context-free languages. Our results show that the reservoir stack machine achieves zero error, even on test sequences longer than the training data, requiring only a few seconds of training time and 100 training sequences.

[1]  Barbara Hammer,et al.  Reservoir Memory Machines as Neural Computers , 2021, IEEE transactions on neural networks and learning systems.

[2]  Alexander Clark,et al.  Learning deterministic context free grammars: The Omphalos competition , 2006, Machine Learning.

[3]  Peter Tiño,et al.  Recurrent Neural Networks with Small Weights Implement Definite Memory Machines , 2003, Neural Computation.

[4]  Jürgen Schmidhuber,et al.  Improving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness Control , 2019, ICLR.

[5]  Alexander Schulz,et al.  Reservoir memory machines , 2020, ESANN.

[6]  Wang Ling,et al.  Learning to Compose Words into Sentences with Reinforcement Learning , 2016, ICLR.

[7]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[8]  Igor Farkas,et al.  Computational analysis of memory capacity in echo state networks , 2016, Neural Networks.

[9]  Daniel Kifer,et al.  Recognizing Long Grammatical Sequences Using Recurrent Networks Augmented With An External Differentiable Stack , 2020, ICGI.

[10]  Jöran Beel,et al.  Implementing Neural Turing Machines , 2018, ICANN.

[11]  Alexander M. Rush,et al.  Unsupervised Recurrent Neural Network Grammars , 2019, NAACL.

[12]  Ryo Yoshinaka,et al.  Probabilistic learnability of context-free grammars with basic distributional properties from positive examples , 2016, Theor. Comput. Sci..

[13]  Minoru Asada,et al.  Information processing in echo state networks at the edge of chaos , 2011, Theory in Biosciences.

[14]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[15]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[16]  Alexander M. Rush,et al.  Compound Probabilistic Context-Free Grammars for Grammar Induction , 2019, ACL.

[17]  Donald E. Knuth,et al.  On the Translation of Languages from Left to Right , 1965, Inf. Control..

[18]  Noga Alon,et al.  Efficient simulation of finite automata by neural nets , 1991, JACM.

[19]  Ani Nenkova,et al.  Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 2016, NAACL 2016.

[20]  Barbara Hammer,et al.  Theoretische Informatik - eine problemorientierte Einführung , 1996, Springer-Lehrbuch.

[21]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[22]  Claudio Gallicchio,et al.  Design of deep echo state networks , 2018, Neural Networks.

[23]  Jǐŕı Š́ıma Analog neuron hierarchy , 2020, Neural Networks.

[24]  Yonatan Belinkov,et al.  Memory-Augmented Recurrent Neural Networks Can Learn Generalized Dyck Languages , 2019, ArXiv.

[25]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[26]  Colin de la Higuera,et al.  Grammatical Inference: Learning Automata and Grammars , 2010 .

[27]  H T Siegelmann,et al.  Dating and Context of Three Middle Stone Age Sites with Bone Points in the Upper Semliki Valley, Zaire , 2007 .

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Dana S. Scott,et al.  Finite Automata and Their Decision Problems , 1959, IBM J. Res. Dev..

[30]  Alex Graves,et al.  Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes , 2016, NIPS.

[31]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[32]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[33]  Claudio Gallicchio,et al.  Architectural and Markovian factors of echo state networks , 2011, Neural Networks.

[34]  Peter Tiño,et al.  Predicting the Future of Discrete Sequences from Fractal Representations of the Past , 2001, Machine Learning.

[35]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[36]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[37]  Peter Tiño,et al.  Simple Deterministically Constructed Cycle Reservoirs with Regular Jumps , 2012, Neural Computation.

[38]  Bartunov Sergey,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[39]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[40]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[41]  Stefan J. Kiebel,et al.  Re-visiting the echo state property , 2012, Neural Networks.

[42]  Chris Eliasmith,et al.  Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks , 2019, NeurIPS.

[43]  Yang Liu,et al.  Dependency Grammar Induction with a Neural Variational Transition-based Parser , 2018, AAAI.