The Neural State Pushdown Automata

In order to learn complex grammars, recurrent neural networks (RNNs) require sufficient computational resources to ensure correct grammar recognition. A widely-used approach to expand model capacity would be to couple an RNN to an external memory stack. Here, we introduce a "neural state" pushdown automaton (NSPDA), which consists of a digital stack, instead of an analog one, that is coupled to a neural network state machine. We empirically show its effectiveness in recognizing various context-free grammars (CFGs). First, we develop the underlying mechanics of the proposed higher order recurrent network and its manipulation of a stack as well as how to stably program its underlying pushdown automaton (PDA) to achieve desired finite-state network dynamics. Next, we introduce a noise regularization scheme for higher-order (tensor) networks, to our knowledge the first of its kind, and design an algorithm for improved incremental learning. Finally, we design a method for inserting grammar rules into a NSPDA and empirically show that this prior knowledge improves its training convergence time by an order of magnitude and, in some cases, leads to better generalization. The NSPDA is also compared to a classical analog stack neural network pushdown automaton (NNPDA) as well as a wide array of first and second-order RNNs with and without external memory, trained using different learning algorithms. Our results show that, for Dyck(2) languages, prior rule-based knowledge is critical for optimization convergence and for ensuring generalization to longer sequences at test time. We observe that many RNNs with and without memory, but no prior knowledge, fail to converge and generalize poorly on CFGs.

[1]  Padhraic Smyth,et al.  Discrete recurrent neural networks for grammatical inference , 1994, IEEE Trans. Neural Networks.

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Jeffrey D. Ullman,et al.  Introduction to automata theory, languages, and computation, 2nd edition , 2001, SIGA.

[4]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[5]  Daniel Kifer,et al.  Lifelong Neural Predictive Coding: Sparsity Yields Less Forgetting when Learning Cumulatively , 2019, ArXiv.

[6]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[7]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[8]  Janet Wiles,et al.  Learning to count without a counter: A case study of dynamics and activation landscapes in recurrent networks , 1995 .

[9]  Guillaume Charpiat,et al.  Training recurrent networks online without backtracking , 2015, ArXiv.

[10]  Wang Ling,et al.  Memory Architectures in Recurrent Neural Network Language Models , 2018, ICLR.

[11]  Robert Frank,et al.  Context-Free Transductions with Neural Stacks , 2018, BlackboxNLP@EMNLP.

[12]  Samuel A. Korsky,et al.  On the Computational Power of RNNs , 2019, ArXiv.

[13]  C. Lee Giles,et al.  The Neural Network Pushdown Automaton: Architecture, Dynamics and Training , 1997, Summer School on Neural Networks.

[14]  C. Lee Giles,et al.  Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.

[15]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[16]  Yann Ollivier,et al.  Unbiased Online Recurrent Optimization , 2017, ICLR.

[17]  Whitney Tabor,et al.  Fractal encoding of context‐free grammars in connectionist networks , 2000, Expert Syst. J. Knowl. Eng..

[18]  Christopher D. Manning,et al.  Learning by Abstraction: The Neural State Machine , 2019, NeurIPS.

[19]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[20]  Peter Tiňo,et al.  Finite State Machines and Recurrent Neural Networks -- Automata and Dynamical Systems Approaches , 1995 .

[21]  Yonatan Belinkov,et al.  LSTM Networks Can Perform Dynamic Counting , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.

[22]  Robert C. Berwick,et al.  Evaluating the Ability of LSTMs to Learn Context-Free Grammars , 2018, BlackboxNLP@EMNLP.

[23]  Yaser S. Abu-Mostafa,et al.  Learning from hints in neural networks , 1990, J. Complex..

[24]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[25]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[26]  C. Lee Giles,et al.  Using Prior Knowledge in a {NNPDA} to Learn Context-Free Languages , 1992, NIPS.

[27]  Yoshua Bengio,et al.  Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.

[28]  Alexander Ororbia,et al.  Biologically Motivated Algorithms for Propagating Local Target Representations , 2018, AAAI.

[29]  King-Sun Fu,et al.  Syntactic Pattern Recognition And Applications , 1968 .

[31]  Marcin Andrychowicz,et al.  Neural Random Access Machines , 2015, ERCIM News.

[32]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[33]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[34]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[35]  Preeti Ranjan Panda Memory Architectures , 2017, Handbook of Hardware/Software Codesign.

[36]  Kyomin Jung,et al.  Number Sequence Prediction Problems for Evaluating Computational Powers of Neural Networks , 2018, AAAI.

[37]  Janet Wiles,et al.  Context-free and context-sensitive dynamics in recurrent neural networks , 2000, Connect. Sci..

[38]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[39]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[40]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[41]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[42]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[43]  Jian Wu,et al.  Learned Neural Iterative Decoding for Lossy Image Compression Systems , 2018, 2019 Data Compression Conference (DCC).