Neural Language Modeling by Jointly Learning Syntax and Lexicon

We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to form better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additional structural supervision at the cost of human expert annotation. In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model. In our model, the gradient can be directly back-propagated from the language model loss into the neural parsing network. Experiments show that the proposed model can discover the underlying syntactic structure and achieve state-of-the-art performance on word/character-level language model tasks.

[1]  Samuel R. Bowman,et al.  Do latent tree learning models identify meaningful structure in sentences? , 2017, TACL.

[2]  Samuel R. Bowman,et al.  Learning to parse from a semantic objective: It works. Is it syntax? , 2017, ArXiv.

[3]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[4]  Ming Zhou,et al.  Sequence-to-Dependency Neural Machine Translation , 2017, ACL.

[5]  Qing He,et al.  Generative Neural Machine for Tree Structures , 2017, 1705.00321.

[6]  Noah A. Smith,et al.  What Do Recurrent Neural Network Grammars Learn About Syntax? , 2016, EACL.

[7]  Nicolas Usunier,et al.  Improving Neural Language Models with a Continuous Cache , 2016, ICLR.

[8]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[9]  Hakan Inan,et al.  Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.

[10]  Tommi S. Jaakkola,et al.  Tree-structured decoding with doubly-recurrent neural networks , 2016, ICLR.

[11]  Rebecca Hwa,et al.  An Evaluation of Parser Robustness for Ungrammatical Sentences , 2016, EMNLP.

[12]  Quoc V. Le,et al.  HyperNetworks , 2016, ICLR.

[13]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[14]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[15]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[16]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[17]  Jürgen Schmidhuber,et al.  Recurrent Highway Networks , 2016, ICML.

[18]  Ying Zhang,et al.  On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.

[19]  Yoshua Bengio,et al.  Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.

[20]  Aaron C. Courville,et al.  Recurrent Batch Normalization , 2016, ICLR.

[21]  Christopher Potts,et al.  A Fast Unified Model for Parsing and Sentence Understanding , 2016, ACL.

[22]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[23]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[24]  Roland Memisevic,et al.  Regularizing RNNs by Stabilizing Activations , 2015, ICLR.

[25]  Liang Lu,et al.  Top-down Tree Long Short-Term Memory Networks , 2015, NAACL.

[26]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[27]  Phil Blunsom,et al.  Generative Incremental Dependency Parsing with Neural Networks , 2015, ACL.

[28]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[29]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[30]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[31]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[32]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[33]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[34]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[35]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[36]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[37]  Steven Skiena,et al.  The Expressive Power of Word Embeddings , 2013, ArXiv.

[38]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[39]  Brian Roark,et al.  Classifying Chart Cells for Quadratic Complexity Context-Free Inference , 2008, COLING.

[40]  Ivan Titov,et al.  A Latent Variable Model for Generative Dependency Parsing , 2007, Trends in Parsing Technology.

[41]  Rens Bod,et al.  An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[42]  Ahmad Emami,et al.  A Neural Syntactic Language Model , 2005, Machine Learning.

[43]  Dan Klein,et al.  Natural language grammar induction with a generative constituent-context model , 2005, Pattern Recognit..

[44]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[45]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[46]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[47]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[48]  Eugene Charniak,et al.  Immediate-Head Parsing for Language Models , 2001, ACL.

[49]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[50]  D. Sandra,et al.  Morphological structure, lexical representation and lexical access , 1998 .

[51]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[52]  Ciprian Chelba,et al.  A Structured Language Model , 1997, ACL.

[53]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[54]  Peter Tiňo,et al.  Learning long-term dependencies is not as difficult with NARX recurrent neural networks , 1995 .

[55]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[56]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[57]  David Marecek,et al.  Twelve Years of Unsupervised Dependency Parsing , 2016, ITAT.

[58]  Ilya Sutskever,et al.  SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .

[59]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[60]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[61]  Yoshua Bengio Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[62]  Eytan Ruppin,et al.  Automatic Acquisition and Efficient Representation of Syntactic Structures , 2002, NIPS.

[63]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[64]  J. Urgen Schmidhuber,et al.  Neural sequence chunkers , 1991, Forschungsberichte, TU Munich.