RNNs Can Generate Bounded Hierarchical Languages with Optimal Memory

Recurrent neural networks empirically generate natural language with high syntactic fidelity. However, their success is not well-understood theoretically. We provide theoretical insight into this success, proving in a finite-precision setting that RNNs can efficiently generate bounded hierarchical languages that reflect the scaffolding of natural language syntax. We introduce Dyck-($k$,$m$), the language of well-nested brackets (of $k$ types) and $m$-bounded nesting depth, reflecting the bounded memory needs and long-distance dependencies of natural language syntax. The best known results use $O(k^{\frac{m}{2}})$ memory (hidden units) to generate these languages. We prove that an RNN with $O(m \log k)$ hidden units suffices, an exponential reduction in memory, by an explicit construction. Finally, we show that no algorithm, even with unbounded computation, can suffice with $o(m \log k)$ hidden units.

[1]  Ngoc Thang Vu,et al.  Learning the Dyck Language with Attention-based Seq2Seq Models , 2019, BlackboxNLP@ACL.

[2]  Lane Schwartz,et al.  Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction , 2018, EMNLP.

[3]  Yonatan Belinkov,et al.  On Evaluating the Generalization of LSTM Models in Formal Languages , 2018, ArXiv.

[4]  Edouard Grave,et al.  Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[5]  Yonatan Belinkov,et al.  LSTM Networks Can Perform Dynamic Counting , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.

[6]  C. Lee Giles,et al.  Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.

[7]  William Merrill,et al.  Sequential Neural Networks as Automata , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.

[8]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[9]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[10]  Marco Baroni,et al.  The emergence of number and syntax units in LSTM language models , 2019, NAACL.

[11]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[12]  David J. Weir,et al.  The convergence of mildly context-sensitive grammar formalisms , 1990 .

[13]  Robert Frank,et al.  Context-Free Transductions with Neural Stacks , 2018, BlackboxNLP@EMNLP.

[14]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[15]  Samuel A. Korsky,et al.  On the Computational Power of RNNs , 2019, ArXiv.

[16]  C. L. Giles,et al.  Machine learning using higher order correlation networks , 1986 .

[17]  Philip Resnik,et al.  Left-Corner Parsing and Psychological Plausibility , 1992, COLING.

[18]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[19]  Piotr Indyk Optimal Simulation of Automata by Neural Nets , 1995, STACS.

[20]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[21]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[22]  Eran Yahav,et al.  On the Practical Computational Power of Finite Precision RNNs for Language Recognition , 2018, ACL.

[23]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[24]  Tal Linzen,et al.  Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Eran Yahav,et al.  A Formal Hierarchy of RNN Architectures , 2020, ACL.

[27]  Tal Linzen,et al.  Modeling garden path effects without explicit hierarchical syntax , 2018, CogSci.

[28]  Doina Precup,et al.  Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning , 2018, AISTATS.

[29]  Robert C. Berwick,et al.  Evaluating the Ability of LSTMs to Learn Context-Free Grammars , 2018, BlackboxNLP@EMNLP.

[30]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[31]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[32]  Noam Chomsky,et al.  The Algebraic Theory of Context-Free Languages* , 1963 .

[33]  William Merrill,et al.  On the Linguistic Capacity of Real-Time Counter Automata , 2020, ArXiv.

[34]  Omer Levy,et al.  Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum , 2018, ACL.

[35]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[36]  Don R. Hush,et al.  Bounds on the complexity of recurrent neural network implementations of finite state machines , 1993, Neural Networks.

[37]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[38]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[39]  Noam Chomsky,et al.  On Certain Formal Properties of Grammars , 1959, Inf. Control..

[40]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[41]  G. A. Miller,et al.  Finitary models of language users , 1963 .

[42]  John Hale,et al.  LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better , 2018, ACL.

[43]  Daniel Jurafsky,et al.  Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context , 2018, ACL.

[44]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..