论文信息 - Colorless Green Recurrent Networks Dream Hierarchically

Colorless Green Recurrent Networks Dream Hierarchically

Recurrent neural networks (RNNs) achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language. We investigate to what extent RNNs learn to track abstract hierarchical syntactic structure. We test whether RNNs trained with a generic language modeling objective in four languages (Italian, English, Hebrew, Russian) can predict long-distance number agreement in various constructions. We include in our evaluation nonsensical sentences where RNNs cannot rely on semantic or lexical cues (“The colorless green ideas I ate with the chair sleep furiously”), and, for Italian, we compare model performance to human intuitions. Our language-model-trained RNNs make reliable predictions about long-distance agreement, and do not lag much behind human performance. We thus bring support to the hypothesis that RNNs are not just shallow-pattern extractors, but they also acquire deeper grammatical competence.

[1] Joe Pater. Generative linguistics and neural networks at 60: Foundation, friction, and fusion , 2019, Language.

[2] Wang Ling,et al. Memory Architectures in Recurrent Neural Network Language Models , 2018, ICLR.

[3] Willem H. Zuidema,et al. Visualisation and 'diagnostic classifiers' reveal how recurrent and recursive neural networks process hierarchical structure , 2017, J. Artif. Intell. Res..

[4] SHALOM LAPPIN,et al. Using Deep Neural Networks to Learn Syntactic Agreement , 2017, LILT.

[5] Yoav Goldberg,et al. Exploring the Syntactic Abilities of RNNs with Multi-task Learning , 2017, CoNLL.

[6] Yonatan Belinkov,et al. What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[7] Samy Bengio,et al. N-gram Language Modeling using Recurrent Neural Network Estimation , 2017, ArXiv.

[8] Rico Sennrich,et al. How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs , 2016, EACL.

[9] Yonatan Belinkov,et al. Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[10] Mirella Lapata,et al. Dependency Parsing as Head Selection , 2016, EACL.

[11] Emmanuel Dupoux,et al. Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[12] Xing Shi,et al. Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.

[13] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[14] James Cross,et al. Incremental Parsing with Minimal Features Using Bi-Directional LSTM , 2016, ACL.

[15] Sampo Pyysalo,et al. Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[16] Eliyahu Kiperwasser,et al. Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.

[17] Noam Chomsky,et al. Structures, Not Strings: Linguistics as Part of the Cognitive Sciences , 2015, Trends in Cognitive Sciences.

[18] A. Goldberg,et al. Evidence for automatic accessing of constructional meaning: Jabberwocky sentences prime associated verbs , 2013 .

[19] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[20] Vysoké Učení,et al. Statistical Language Models Based on Neural Networks , 2012 .

[21] Kenneth Heafield,et al. KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[22] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[23] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[24] Kathryn Bock,et al. Agreement and attraction in Russian , 2008 .

[25] Bo Cartling,et al. On the implicit acquisition of a context-free grammar by a simple recurrent neural network , 2008, Neurocomputing.

[26] Mauro Cettolo,et al. IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[27] J. Elman. Distributed representations, simple recurrent networks, and grammatical structure , 1991, Machine Learning.

[28] Jürgen Schmidhuber,et al. LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[29] D. Perani,et al. Syntax and the Brain: Disentangling Grammar by Selective Anomalies , 2001, NeuroImage.

[30] A. Friederici,et al. Auditory Language Comprehension: An Event-Related fMRI Study on the Processing of Syntactic and Lexical Information , 2000, Brain and Language.

[31] Fernando C Pereira. Formal grammar and information theory: together again? , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[32] Nick Chater,et al. Toward a connectionist model of recursion in human linguistic performance , 1999, Cogn. Sci..

[33] Helmut Schmid,et al. Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[34] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[35] David C. Plaut,et al. Simple Recurrent Networks and Natural Language: How Important is Starting Small? , 1997 .

[36] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[37] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[38] K. Bock,et al. Broken agreement , 1991, Cognitive Psychology.

[39] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[40] Ivan A. Sag,et al. On parasitic gaps , 1983 .