Evaluating the Ability of LSTMs to Learn Context-Free Grammars

While long short-term memory (LSTM) neural net architectures are designed to capture sequence information, human language is generally composed of hierarchical structures. This raises the question as to whether LSTMs can learn hierarchical structures. We explore this question with a well-formed bracket prediction task using two types of brackets modeled by an LSTM. Demonstrating that such a system is learnable by an LSTM is the first step in demonstrating that the entire class of CFLs is also learnable. We observe that the model requires exponential memory in terms of the number of characters and embedded depth, where a sub-linear memory should suffice. Still, the model does more than memorize the training input. It learns how to distinguish between relevant and irrelevant information. On the other hand, we also observe that the model does not generalize well. We conclude that LSTMs do not learn the relevant underlying context-free rules, suggesting the good overall performance is attained rather by an efficient way of evaluating nuisance variables. LSTMs are a way to quickly reach good results for many natural language tasks, but to understand and generate natural language one has to investigate other concepts that can make more direct use of natural language's structural nature.

[1]  Jascha Sohl-Dickstein,et al.  Capacity and Trainability in Recurrent Neural Networks , 2016, ICLR.

[2]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[3]  Xing Shi,et al.  Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[4]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[5]  Noam Chomsky,et al.  The Algebraic Theory of Context-Free Languages* , 1963 .

[6]  Paul Rodríguez,et al.  Simple Recurrent Networks Learn Context-Free and Context-Sensitive Languages by Counting , 2001, Neural Computation.

[7]  Quoc V. Le,et al.  Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.

[8]  André Grüning,et al.  Stack-like and queue-like dynamics in recurrent neural networks , 2006, Connect. Sci..

[9]  Christo Kirov,et al.  Processing of nested and cross-serial dependencies: an automaton perspective on SRN behaviour , 2012, Connect. Sci..

[10]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[11]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[12]  Noam Chomsky,et al.  Structures, Not Strings: Linguistics as Part of the Cognitive Sciences , 2015, Trends in Cognitive Sciences.

[13]  Whitney Tabor,et al.  Fractal encoding of context‐free grammars in connectionist networks , 2000, Expert Syst. J. Knowl. Eng..

[14]  Jean-Philippe Bernardy,et al.  Can Recurrent Neural Networks Learn Nested Recursion? , 2018, LILT.

[15]  Eliyahu Kiperwasser,et al.  Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.

[16]  Yonatan Belinkov,et al.  What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[17]  Marco Baroni,et al.  Memorize or generalize? Searching for a compositional RNN in a haystack , 2018, ArXiv.

[18]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[19]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[20]  Claire Mathieu,et al.  Recognizing well-parenthesized expressions in the streaming model , 2009, STOC '10.

[21]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[22]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[23]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Peter Norvig,et al.  Deep Learning with Dynamic Computation Graphs , 2017, ICLR.

[26]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[27]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[28]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[29]  Finale Doshi-Velez,et al.  Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models , 2016, ArXiv.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Jean Berstel,et al.  Context-Free Languages and Pushdown Automata , 1997, Handbook of Formal Languages.

[32]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[33]  Paulo E. Rauber,et al.  Visualizing the Hidden Activity of Artificial Neural Networks , 2017, IEEE Transactions on Visualization and Computer Graphics.

[34]  Thomas Klikauer Why Only Us: Language and Evolution , 2017 .