论文信息 - RNNs Can Generate Bounded Hierarchical Languages with Optimal Memory

RNNs Can Generate Bounded Hierarchical Languages with Optimal Memory

Recurrent neural networks empirically generate natural language with high syntactic fidelity. However, their success is not well-understood theoretically. We provide theoretical insight into this success, proving in a finite-precision setting that RNNs can efficiently generate bounded hierarchical languages that reflect the scaffolding of natural language syntax. We introduce Dyck-($k$,$m$), the language of well-nested brackets (of $k$ types) and $m$-bounded nesting depth, reflecting the bounded memory needs and long-distance dependencies of natural language syntax. The best known results use $O(k^{\frac{m}{2}})$ memory (hidden units) to generate these languages. We prove that an RNN with $O(m \log k)$ hidden units suffices, an exponential reduction in memory, by an explicit construction. Finally, we show that no algorithm, even with unbounded computation, can suffice with $o(m \log k)$ hidden units.

[1] Ngoc Thang Vu,et al. Learning the Dyck Language with Attention-based Seq2Seq Models , 2019, BlackboxNLP@ACL.

[2] Lane Schwartz,et al. Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction , 2018, EMNLP.

[3] Yonatan Belinkov,et al. On Evaluating the Generalization of LSTM Models in Formal Languages , 2018, ArXiv.

[4] Edouard Grave,et al. Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[5] Yonatan Belinkov,et al. LSTM Networks Can Perform Dynamic Counting , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.

[6] C. Lee Giles,et al. Constructing deterministic finite-state automata in recurrent neural networks , 1996, JACM.

[7] William Merrill,et al. Sequential Neural Networks as Automata , 2019, Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges.

[8] Alaa A. Kharbouch,et al. Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[9] Fei-Fei Li,et al. Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[10] Marco Baroni,et al. The emergence of number and syntax units in LSTM language models , 2019, NAACL.

[11] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[12] David J. Weir,et al. The convergence of mildly context-sensitive grammar formalisms , 1990 .

[13] Robert Frank,et al. Context-Free Transductions with Neural Stacks , 2018, BlackboxNLP@EMNLP.

[14] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[15] Samuel A. Korsky,et al. On the Computational Power of RNNs , 2019, ArXiv.

[16] C. L. Giles,et al. Machine learning using higher order correlation networks , 1986 .

[17] Philip Resnik,et al. Left-Corner Parsing and Psychological Plausibility , 1992, COLING.

[18] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[19] Piotr Indyk. Optimal Simulation of Automata by Neural Nets , 1995, STACS.