Recognizing Long Grammatical Sequences Using Recurrent Networks Augmented With An External Differentiable Stack

Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction. Despite success in applications such as machine translation and voice recognition, these stateful models have several critical shortcomings. Specifically, RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems. For example, RNNs struggle in recognizing complex context free languages (CFLs), never reaching 100% accuracy on training. One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack. However, differentiable memories in prior work have neither been extensively studied on CFLs nor tested on sequences longer than those seen in training. The few efforts that have studied them have shown that continuous differentiable memory structures yield poor generalization for complex CFLs, making the RNN less interpretable. In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms that ensure that the model learns to properly balance the use of its latent states with external memory. Our improved RNN models exhibit better generalization performance and are able to classify long strings generated by complex hierarchical context free grammars (CFGs). We evaluate our models on CGGs, including the Dyck languages, as well as on the Penn Treebank language modelling task, and achieve stable, robust performance across these benchmarks. Furthermore, we show that only our memory-augmented networks are capable of retaining memory for a longer duration up to strings of length 160.

[1]  Paolo Frasconi,et al.  Computational capabilities of local-feedback recurrent networks acting as finite-state machines , 1996, IEEE Trans. Neural Networks.

[2]  Mikel L. Forcada,et al.  Second-Order Recurrent Neural Networks Can Learn Regular Grammars from Noisy Strings , 1995, IWANN.

[3]  Josef Pichler,et al.  Grammatical inference from data exchange files: An experiment on engineering software , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[4]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[5]  Janet Wiles,et al.  Learning to count without a counter: A case study of dynamics and activation landscapes in recurrent networks , 1995 .

[6]  Wang Ling,et al.  Memory Architectures in Recurrent Neural Network Language Models , 2018, ICLR.

[7]  C. Lee Giles,et al.  Using Prior Knowledge in a {NNPDA} to Learn Context-Free Languages , 1992, NIPS.

[8]  Robert Frank,et al.  Context-Free Transductions with Neural Stacks , 2018, BlackboxNLP@EMNLP.

[9]  Wojciech Zaremba,et al.  Learning Simple Algorithms from Examples , 2015, ICML.

[10]  Alexander G. Ororbia,et al.  The Sibling Neural Estimator: Improving Iterative Image Decoding with Gradient Communication , 2020, 2020 Data Compression Conference (DCC).

[11]  C. Lee Giles,et al.  Training Second-Order Recurrent Neural Networks using Hints , 1992, ML.

[12]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[13]  Colin de la Higuera,et al.  Grammatical Inference: Learning Automata and Grammars , 2010 .

[14]  Samuel A. Korsky,et al.  On the Computational Power of RNNs , 2019, ArXiv.

[15]  Michael C. Mozer,et al.  A Connectionist Symbol Manipulator that Discovers the Structure of Context-Free Languages , 1992, NIPS.

[16]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[17]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[18]  Bohyung Han,et al.  Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization , 2017, NIPS.

[19]  Whitney Tabor,et al.  Fractal encoding of context‐free grammars in connectionist networks , 2000, Expert Syst. J. Knowl. Eng..

[20]  C. Lee Giles,et al.  Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.

[21]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[22]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[23]  Mikel L. Forcada,et al.  Stable Encoding of Finite-State Machines in Discrete-Time Recurrent Neural Nets with Sigmoid Units , 2000, Neural Computation.

[24]  Daniel Kifer,et al.  Online Learning of Recurrent Neural Architectures by Locally Aligning Distributed Representations , 2018, ArXiv.

[25]  Mikel L. Forcada,et al.  Constrained Second-Order Recurrent Networks for Finite-State Automata Induction , 1998 .

[26]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[27]  Henrik Jacobsson,et al.  Rule Extraction from Recurrent Neural Networks: ATaxonomy and Review , 2005, Neural Computation.

[28]  Cheng Wang,et al.  State-Regularized Recurrent Neural Networks , 2019, ICML.

[29]  Daniel Kifer,et al.  Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Eran Yahav,et al.  Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples , 2017, ICML.

[31]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[32]  Kyomin Jung,et al.  Number Sequence Prediction Problems for Evaluating Computational Powers of Neural Networks , 2018, AAAI.

[33]  Stefan C. Kremer,et al.  On the computational power of Elman-style recurrent networks , 1995, IEEE Trans. Neural Networks.

[34]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[35]  Padhraic Smyth,et al.  Discrete recurrent neural networks for grammatical inference , 1994, IEEE Trans. Neural Networks.

[36]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[37]  Ah Chung Tsoi,et al.  Noisy Time Series Prediction using Recurrent Neural Networks and Grammatical Inference , 2001, Machine Learning.

[38]  Don R. Hush,et al.  Bounds on the complexity of recurrent neural network implementations of finite state machines , 1993, Neural Networks.

[39]  Olgierd Unold,et al.  Use of a Novel Grammatical Inference Approach in Classification of Amyloidogenic Hexapeptides , 2016, Comput. Math. Methods Medicine.

[40]  Colin Giles,et al.  Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory (cid:3) , 1992 .

[41]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[42]  C. Lee Giles,et al.  The Neural State Pushdown Automata , 2019, ArXiv.

[43]  C. Lee Giles,et al.  Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[44]  Sameer Singh,et al.  Memory Augmented Recursive Neural Networks , 2019, ArXiv.

[45]  Robert C. Berwick,et al.  Evaluating the Ability of LSTMs to Learn Context-Free Grammars , 2018, BlackboxNLP@EMNLP.

[46]  David Reitter,et al.  Learning Simpler Language Models with the Differential State Framework , 2017, Neural Computation.

[47]  David Reitter,et al.  Like a Baby: Visually Situated Neural Language Acquisition , 2018, ACL.

[48]  Yonatan Belinkov,et al.  LSTM Networks Can Perform Dynamic Counting , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.

[49]  Hava T. Siegelmann,et al.  Analog computation via neural networks , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[50]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[51]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[52]  Steve Renals,et al.  Multiplicative LSTM for sequence modelling , 2016, ICLR.

[53]  Eran Yahav,et al.  On the Practical Computational Power of Finite Precision RNNs for Language Recognition , 2018, ACL.

[54]  Maurice Nivat On some families of languages related to the Dyck language , 1970, STOC '70.

[55]  William Merrill,et al.  Sequential Neural Networks as Automata , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.

[56]  Janet Wiles,et al.  Context-free and context-sensitive dynamics in recurrent neural networks , 2000, Connect. Sci..

[57]  Mikel L. Forcada,et al.  Learning the Initial State of a Second-Order Recurrent Neural Network during Regular-Language Inference , 1995, Neural Computation.

[58]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[59]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[60]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[61]  C. Lee Giles,et al.  The Neural Network Pushdown Automaton: Architecture, Dynamics and Training , 1997, Summer School on Neural Networks.

[62]  C. Lee Giles,et al.  An analysis of noise in recurrent neural networks: convergence and generalization , 1996, IEEE Trans. Neural Networks.

[63]  Marcin Andrychowicz,et al.  Neural Random Access Machines , 2015, ERCIM News.

[64]  Ruslan Salakhutdinov,et al.  Multimodal Neural Language Models , 2014, ICML.

[65]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[66]  Tristan Deleu,et al.  Learning Operations on a Stack with Neural Turing Machines , 2016, ArXiv.

[67]  Noam Chomsky,et al.  The Algebraic Theory of Context-Free Languages* , 1963 .

[68]  J. Kolen Recurrent Networks: State Machines Or Iterated Function Systems? , 1994 .

[69]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[70]  Truyen Tran,et al.  Neural Stored-program Memory , 2019, ICLR.

[71]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[72]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[73]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[74]  Ying Zhang,et al.  On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.

[75]  T. Ziemke Towards adaptive perception in autonomous robots using second-order recurrent networks , 1996, Proceedings of the First Euromicro Workshop on Advanced Mobile Robots (EUROBOT '96).

[76]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[77]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[78]  Yonatan Belinkov,et al.  Memory-Augmented Recurrent Neural Networks Can Learn Generalized Dyck Languages , 2019, ArXiv.