Provably Stable Interpretable Encodings of Context Free Grammars in RNNs with a Differentiable Stack

Given a collection of strings belonging to a context free grammar (CFG) and another collection of strings not belonging to the CFG, how might one infer the grammar? This is the problem of grammatical inference. Since CFGs are the languages recognized by pushdown automata (PDA), it suffices to determine the state transition rules and stack action rules of the corresponding PDA. An approach would be to train a recurrent neural network (RNN) to classify the sample data and attempt to extract these PDA rules. But neural networks are not a priori aware of the structure of a PDA and would likely require many samples to infer this structure. Furthermore, extracting the PDA rules from the RNN is nontrivial. We build a RNN specifically structured like a PDA, where weights correspond directly to the PDA rules. This requires a stack architecture that is somehow differentiable (to enable gradient-based learning) and stable (an unstable stack will show deteriorating performance with longer strings). We propose a stack architecture that is differentiable and that provably exhibits orbital stability. Using this stack, we construct a neural network that provably approximates a PDA for strings of arbitrary length. Moreover, our model and method of proof can easily be generalized to other state machines, such as a Turing Machine.

[1]  Daniel Kifer,et al.  Recognizing Long Grammatical Sequences Using Recurrent Networks Augmented With An External Differentiable Stack , 2020, ICGI.

[2]  Noah A. Smith,et al.  A Formal Hierarchy of RNN Architectures , 2020, ACL.

[3]  Colin de la Higuera,et al.  Distance and Equivalence between Finite State Machines and Recurrent Neural Networks: Computational results , 2020, ArXiv.

[4]  Fei Fang,et al.  Artificial Intelligence for Social Good: A Survey , 2020, ArXiv.

[5]  Anima Anandkumar,et al.  Memory Augmented Recursive Neural Networks , 2019, ArXiv.

[6]  Luis F. Lago-Fernández,et al.  On the Interpretation of Recurrent Neural Networks as Finite State Machines , 2019, ICANN.

[7]  C. Lee Giles,et al.  The Neural State Pushdown Automata , 2019, ArXiv.

[8]  Anna Jobin,et al.  The global landscape of AI ethics guidelines , 2019, Nature Machine Intelligence.

[9]  Nathalie A. Smuha The EU Approach to Ethics Guidelines for Trustworthy Artificial Intelligence , 2019, Computer Law Review International.

[10]  Michael Hahn,et al.  Theoretical Limitations of Self-Attention in Neural Sequence Models , 2019, TACL.

[11]  Samuel A. Korsky,et al.  On the Computational Power of RNNs , 2019, ArXiv.

[12]  William Merrill,et al.  Sequential Neural Networks as Automata , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.

[13]  Zhaopeng Tu,et al.  Assessing the Ability of Self-Attention Networks to Learn Word Order , 2019, ACL.

[14]  Truyen Tran,et al.  Neural Stored-program Memory , 2019, ICLR.

[15]  Luciano Floridi,et al.  Establishing the rules for building trustworthy AI , 2019, Nature Machine Intelligence.

[16]  Fan Yang,et al.  On Attribution of Recurrent Neural Network Predictions via Additive Decomposition , 2019, WWW.

[17]  Mathias Niepert,et al.  State-Regularized Recurrent Neural Networks , 2019, ICML.

[18]  Jorge Pérez,et al.  On the Turing Completeness of Modern Neural Network Architectures , 2019, ICLR.

[19]  Robert C. Berwick,et al.  Evaluating the Ability of LSTMs to Learn Context-Free Grammars , 2018, BlackboxNLP@EMNLP.

[20]  Robert Frank,et al.  Context-Free Transductions with Neural Stacks , 2018, BlackboxNLP@EMNLP.

[21]  David Reitter,et al.  Like a Baby: Visually Situated Neural Language Acquisition , 2018, ACL.

[22]  Kyomin Jung,et al.  Number Sequence Prediction Problems for Evaluating Computational Powers of Neural Networks , 2018, AAAI.

[23]  Eran Yahav,et al.  On the Practical Computational Power of Finite Precision RNNs for Language Recognition , 2018, ACL.

[24]  Josef Pichler,et al.  Grammatical inference from data exchange files: An experiment on engineering software , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[25]  Wang Ling,et al.  Memory Architectures in Recurrent Neural Network Language Models , 2018, ICLR.

[26]  Eran Yahav,et al.  Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples , 2017, ICML.

[27]  Andreas Maletti,et al.  Recurrent Neural Networks as Weighted Language Recognizers , 2017, NAACL.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[30]  Olgierd Unold,et al.  Use of a Novel Grammatical Inference Approach in Classification of Amyloidogenic Hexapeptides , 2016, Comput. Math. Methods Medicine.

[31]  Marcin Andrychowicz,et al.  Neural Random Access Machines , 2015, ERCIM News.

[32]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[33]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[34]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[35]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[36]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[37]  Ruslan Salakhutdinov,et al.  Multimodal Neural Language Models , 2014, ICML.

[38]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[39]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Colin de la Higuera,et al.  Grammatical Inference: Learning Automata and Grammars , 2010 .

[41]  Henrik Jacobsson,et al.  Rule Extraction from Recurrent Neural Networks: ATaxonomy and Review , 2005, Neural Computation.

[42]  Felix Alexander Gers,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[43]  Ah Chung Tsoi,et al.  Noisy Time Series Prediction using Recurrent Neural Networks and Grammatical Inference , 2001, Machine Learning.

[44]  Janet Wiles,et al.  Context-free and context-sensitive dynamics in recurrent neural networks , 2000, Connect. Sci..

[45]  Mikel L. Forcada,et al.  Stable Encoding of Finite-State Machines in Discrete-Time Recurrent Neural Nets with Sigmoid Units , 2000, Neural Computation.

[46]  Whitney Tabor,et al.  Fractal encoding of context‐free grammars in connectionist networks , 2000, Expert Syst. J. Knowl. Eng..

[47]  Jürgen Schmidhuber,et al.  Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[48]  C. Lee Giles,et al.  The Neural Network Pushdown Automaton: Architecture, Dynamics and Training , 1997, Summer School on Neural Networks.

[49]  C. Lee Giles,et al.  Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.

[50]  Paolo Frasconi,et al.  Computational capabilities of local-feedback recurrent networks acting as finite-state machines , 1996, IEEE Trans. Neural Networks.

[51]  C. Lee Giles,et al.  Stable Encoding of Large Finite-State Automata in Recurrent Neural Networks with Sigmoid Discriminants , 1996, Neural Computation.

[52]  Alberto Sanfeliu,et al.  An Algebraic Framework to Represent Finite State Machines in Single-Layer Recurrent Neural Networks , 1995, Neural Computation.

[53]  Stefan C. Kremer,et al.  On the computational power of Elman-style recurrent networks , 1995, IEEE Trans. Neural Networks.

[54]  Padhraic Smyth,et al.  Discrete recurrent neural networks for grammatical inference , 1994, IEEE Trans. Neural Networks.

[55]  Don R. Hush,et al.  Bounds on the complexity of recurrent neural network implementations of finite state machines , 1993, Neural Networks.

[56]  Hava T. Siegelmann,et al.  Analog computation via neural networks , 1993, [1993] The 2nd Israel Symposium on Theory and Computing Systems.

[57]  C. Lee Giles,et al.  Using Prior Knowledge in a {NNPDA} to Learn Context-Free Languages , 1992, NIPS.

[58]  Hava T. Siegelmann,et al.  On the computational power of neural nets , 1992, COLT '92.

[59]  C. Lee Giles,et al.  Extracting and Learning an Unknown Grammar with Recurrent Neural Networks , 1991, NIPS.

[60]  Noga Alon,et al.  Efficient simulation of finite automata by neural nets , 1991, JACM.

[61]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[62]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[63]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[64]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[65]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[66]  Jeffrey D. Ullman,et al.  Introduction to automata theory, languages, and computation, 2nd edition , 2001, SIGA.

[67]  C. Lee Giles,et al.  Extraction of rules from discrete-time recurrent neural networks , 1996, Neural Networks.

[68]  Janet Wiles,et al.  Learning to count without a counter: A case study of dynamics and activation landscapes in recurrent networks , 1995 .

[69]  J. Kolen Recurrent Networks: State Machines Or Iterated Function Systems? , 1994 .

[70]  C. Lee Giles,et al.  Extraction, Insertion and Refinement of Symbolic Rules in Dynamically Driven Recurrent Neural Networks , 1993 .

[71]  Colin Giles,et al.  Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory (cid:3) , 1992 .

[72]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .