Neural Language Modeling by Jointly Learning Syntax and Lexicon

We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additional structural supervision at the cost of human expert annotation. In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model. In our model, the gradient can be directly back-propagated from the language model loss into the neural parsing network. Experiments show that the proposed model can discover the underlying syntactic structure and achieve state-of-the-art performance on word/character-level language model tasks.

[1]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[2]  J. Urgen Schmidhuber,et al.  Neural sequence chunkers , 1991, Forschungsberichte, TU Munich.

[3]  Providen e RIe Immediate-Head Parsing for Language Models , 2001 .

[4]  Jürgen Schmidhuber,et al.  Recurrent Highway Networks , 2016, ICML.

[5]  Rebecca Hwa,et al.  An Evaluation of Parser Robustness for Ungrammatical Sentences , 2016, EMNLP.

[6]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[7]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[8]  Ming Zhou,et al.  Sequence-to-Dependency Neural Machine Translation , 2017, ACL.

[9]  Ying Zhang,et al.  On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.

[10]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[12]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[13]  Aaron C. Courville,et al.  Recurrent Batch Normalization , 2016, ICLR.

[14]  Tommi S. Jaakkola,et al.  Tree-structured decoding with doubly-recurrent neural networks , 2016, ICLR.

[15]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[16]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[17]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[18]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[19]  Liang Lu,et al.  Top-down Tree Long Short-Term Memory Networks , 2015, NAACL.

[20]  Yoshua Bengio,et al.  Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.

[21]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[22]  D. Sandra,et al.  Morphological structure, lexical representation and lexical access , 1998 .

[23]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[24]  Samuel R. Bowman,et al.  Learning to parse from a semantic objective: It works. Is it syntax? , 2017, ArXiv.

[25]  Ciprian Chelba,et al.  A Structured Language Model , 1997, ACL.

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  Nicolas Usunier,et al.  Improving Neural Language Models with a Continuous Cache , 2016, ICLR.

[28]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[29]  Ivan Titov,et al.  A Latent Variable Model for Generative Dependency Parsing , 2007, Trends in Parsing Technology.

[30]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[31]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[32]  Phil Blunsom,et al.  Generative Incremental Dependency Parsing with Neural Networks , 2015, ACL.

[33]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[34]  Ilya Sutskever,et al.  SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .

[35]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[36]  Ciprian Chelba,et al.  A Structured Language Model , 1997, Annual Meeting of the Association for Computational Linguistics.

[37]  David Marecek,et al.  Twelve Years of Unsupervised Dependency Parsing , 2016, ITAT.

[38]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[39]  Roland Memisevic,et al.  Regularizing RNNs by Stabilizing Activations , 2015, ICLR.

[40]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[41]  Eytan Ruppin,et al.  Automatic Acquisition and Efficient Representation of Syntactic Structures , 2002, NIPS.

[42]  Qing He,et al.  Generative Neural Machine for Tree Structures , 2017, 1705.00321.

[43]  Noah A. Smith,et al.  What Do Recurrent Neural Network Grammars Learn About Syntax? , 2016, EACL.

[44]  Brian Roark,et al.  Classifying Chart Cells for Quadratic Complexity Context-Free Inference , 2008, COLING.

[45]  Dan Klein,et al.  Natural language grammar induction with a generative constituent-context model , 2005, Pattern Recognit..

[46]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[47]  Rens Bod,et al.  An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[48]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[49]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[50]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[51]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[52]  Steven Skiena,et al.  The Expressive Power of Word Embeddings , 2013, ArXiv.

[53]  Peter Tiňo,et al.  Learning long-term dependencies is not as difficult with NARX recurrent neural networks , 1995 .

[54]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[55]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[56]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[57]  Ahmad Emami,et al.  A Neural Syntactic Language Model , 2005, Machine Learning.

[58]  Christopher Potts,et al.  A Fast Unified Model for Parsing and Sentence Understanding , 2016, ACL.

[59]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[60]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[61]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[62]  Samuel R. Bowman,et al.  Do latent tree learning models identify meaningful structure in sentences? , 2017, TACL.