Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach

It is commonly believed that knowledge of syntactic structure should improve language modeling. However, effectively and computationally efficiently incorporating syntactic structure into neural language models has been a challenging topic. In this paper, we make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances", where information between these two separate objectives shares the same intermediate representation. Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.

[1]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[2]  Yoshua Bengio,et al.  Fraternal Dropout , 2017, ICLR.

[3]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[4]  James F. Allen,et al.  Incorporating POS tagging into language modeling , 1997, EUROSPEECH.

[5]  M. A. R T A P A L The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005 .

[6]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[7]  Ivan A. Sag,et al.  Syntactic Theory: A Formal Introduction , 1999, Computational Linguistics.

[8]  Roy Schwartz,et al.  PaLM: A Hybrid Parser and Language Model , 2019, EMNLP.

[9]  Mark Steedman,et al.  Combinatory Categorial Grammar , 2011 .

[10]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[11]  R. Carter,et al.  Cambridge Grammar of English , 2006 .

[12]  Masaaki Nagata,et al.  Direct Output Connection for a High-Rank Language Model , 2018, EMNLP.

[13]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[14]  J. Bresnan Lexical-Functional Syntax , 2000 .

[15]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[16]  Nicolas Usunier,et al.  Improving Neural Language Models with a Continuous Cache , 2016, ICLR.

[17]  Jan E. Holly,et al.  Pictures of Ultrametric Spaces, the p-adic Numbers, and Valued Fields , 2001, Am. Math. Mon..

[18]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[19]  Frederick Jelinek,et al.  Structured language modeling , 2000, Comput. Speech Lang..

[20]  Chris Dyer,et al.  A Critical Analysis of Biased Parsers in Unsupervised Parsing , 2019, ArXiv.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Alexander M. Rush,et al.  Compound Probabilistic Context-Free Grammars for Grammar Induction , 2019, ACL.

[23]  John Hale,et al.  LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better , 2018, ACL.

[24]  W. Levelt Formal grammars in linguistics and psycholinguistics : Vol.III, Psycholinguistic applications , 1974 .

[25]  Alexander M. Rush,et al.  Unsupervised Recurrent Neural Network Grammars , 2019, NAACL.

[26]  Chuan Yi Tang,et al.  Approximation and Exact Algorithms for Constructing Minimum Ultrametric Trees from Distance Matrices , 1998, COCOON.

[27]  Gareth J. F. Jones,et al.  A robust language model incorporating a substring parser and extended n-grams , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  John Hale,et al.  Finding syntax in human encephalography with beam search , 2018, ACL.

[29]  Ruslan Salakhutdinov,et al.  Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[30]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[31]  Joakim Nivre,et al.  Transition-based Dependency Parsing with Rich Non-local Features , 2011, ACL.

[32]  J. Zwart The Minimalist Program , 1998, Journal of Linguistics.

[33]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[34]  Samuel R. Bowman,et al.  Do latent tree learning models identify meaningful structure in sentences? , 2017, TACL.

[35]  Joakim Nivre,et al.  Algorithms for Deterministic Incremental Dependency Parsing , 2008, CL.

[36]  Samuel R. Bowman,et al.  Grammar Induction with Neural Language Models: An Unusual Replication , 2018, EMNLP.

[37]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[38]  Dirk Van Compernolle,et al.  A Structured Language Model Based on Context-Sensitive Probabilistic Left-Corner Parsing , 2001, NAACL.

[39]  Ahmad Emami,et al.  Training Connectionist Models for the Structured Language Model , 2003, EMNLP.

[40]  Mirella Lapata,et al.  A Generative Parser with a Discriminative Recognition Algorithm , 2017, ACL.

[41]  H. Hughes The Cambridge Grammar of the English Language , 2003 .

[42]  Aaron C. Courville,et al.  Neural Language Modeling by Jointly Learning Syntax and Lexicon , 2017, ICLR.

[43]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[44]  B. Srinivas "Almost parsing" technique for language modeling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[45]  Andreas Stolcke,et al.  Using a stochastic context-free grammar as a language model for speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[46]  Stephen Clark,et al.  Scalable Syntax-Aware Language Models Using Knowledge Distillation , 2019, ACL.

[47]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[48]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[49]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[50]  Jürgen Schmidhuber,et al.  Recurrent Highway Networks , 2016, ICML.

[51]  Peng Xu,et al.  A Study on Richer Syntactic Dependencies for Structured Language Modeling , 2002, ACL.

[52]  Dan Klein,et al.  A Minimal Span-Based Neural Constituency Parser , 2017, ACL.