Unsupervised Recurrent Neural Network Grammars

Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNGs. Since directly marginalizing over the space of latent trees is intractable, we instead apply amortized variational inference. To maximize the evidence lower bound, we develop an inference network parameterized as a neural CRF constituency parser. On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese. On constituency grammar induction, they are competitive with recent neural language models that induce tree structures from words through attention mechanisms.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Andrew Y. Ng,et al.  Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[3]  Hongyu Guo,et al.  Long Short-Term Memory Over Tree Structures , 2015, ArXiv.

[4]  Andriy Mnih,et al.  Variational Inference for Monte Carlo Objectives , 2016, ICML.

[5]  Phil Blunsom,et al.  Generative Incremental Dependency Parsing with Neural Networks , 2015, ACL.

[6]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[7]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[8]  Ivan Titov,et al.  Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder , 2018, ICLR.

[9]  Baobao Chang,et al.  Graph-based Dependency Parsing with Bidirectional LSTM , 2016, ACL.

[10]  P. Glynn LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .

[11]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[12]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[13]  J. Baker Trainable grammars for speech recognition , 1979 .

[14]  Joshua Goodman,et al.  Parsing Inside-Out , 1998, ArXiv.

[15]  Noah A. Smith,et al.  Backpropagating through Structured Argmax using a SPIGOT , 2018, ACL.

[16]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[17]  Peter W. Glynn,et al.  Likelilood ratio gradient estimation: an overview , 1987, WSC '87.

[18]  Richard Socher,et al.  Towards Neural Machine Translation with Latent Tree Attention , 2017, SPNLP@EMNLP.

[19]  John Hale,et al.  Finding syntax in human encephalography with beam search , 2018, ACL.

[20]  Yang Liu,et al.  Dependency Grammar Induction with a Neural Variational Transition-based Parser , 2018, AAAI.

[21]  Claire Cardie,et al.  Towards Dynamic Computation Graphs via Sparse Latent Structure , 2018, EMNLP.

[22]  Ruslan Salakhutdinov,et al.  Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[23]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[24]  Dan Klein,et al.  Neural CRF Parsing , 2015, ACL.

[25]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[26]  Yoav Goldberg,et al.  Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[27]  Graham Neubig,et al.  On-the-fly Operation Batching in Dynamic Computation Graphs , 2017, NIPS.

[28]  Anoop Sarkar,et al.  Top-down Tree Structured Decoding with Syntactic Connections for Neural Machine Translation and Parsing , 2018, EMNLP.

[29]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[30]  Dan Klein,et al.  Abstract Syntax Networks for Code Generation and Semantic Parsing , 2017, ACL.

[31]  Rebecca Hwa,et al.  Sample Selection for Statistical Grammar Induction , 2000, EMNLP.

[32]  Noah A. Smith,et al.  What Do Recurrent Neural Network Grammars Learn About Syntax? , 2016, EACL.

[33]  Graham Neubig,et al.  A Tree-based Decoder for Neural Machine Translation , 2018, EMNLP.

[34]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[35]  Stephen Clark,et al.  Latent Tree Learning with Differentiable Parsers: Shift-Reduce Parsing and Chart Parsing , 2018, ArXiv.

[36]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[37]  Phil Blunsom,et al.  Neural Syntactic Generative Models with Exact Marginalization , 2018, NAACL.

[38]  Christopher D. Manning,et al.  Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.

[39]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[40]  Samuel R. Bowman,et al.  Do latent tree learning models identify meaningful structure in sentences? , 2017, TACL.

[41]  Yoav Seginer,et al.  Fast Unsupervised Incremental Parsing , 2007, ACL.

[42]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[43]  Alexander M. Rush,et al.  Structured Attention Networks , 2017, ICLR.

[44]  Eric P. Xing,et al.  Spectral Unsupervised Parsing with Additive Tree Metrics , 2014, ACL.

[45]  Yonatan Bisk,et al.  Inducing Grammars with and for Neural Machine Translation , 2018, NMT@ACL.

[46]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[47]  Kewei Tu,et al.  CRF Autoencoder for Unsupervised Dependency Parsing , 2017, EMNLP.

[48]  David Duvenaud,et al.  Inference Suboptimality in Variational Autoencoders , 2018, ICML.

[49]  Aaron C. Courville,et al.  Neural Language Modeling by Jointly Learning Syntax and Lexicon , 2017, ICLR.

[50]  Chris Dyer,et al.  Learning to Discover, Ground and Use Words with Segmental Neural Language Models , 2018, ACL.

[51]  John Hale,et al.  LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better , 2018, ACL.

[52]  Noah A. Smith,et al.  Conditional Random Field Autoencoders for Unsupervised Structured Prediction , 2014, NIPS.

[53]  Yoshimasa Tsuruoka,et al.  Learning to Parse and Translate Improves Neural Machine Translation , 2017, ACL.

[54]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[55]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[56]  Tommi S. Jaakkola,et al.  Tree-structured decoding with doubly-recurrent neural networks , 2016, ICLR.

[57]  Andrew McCallum,et al.  Linguistically-Informed Self-Attention for Semantic Role Labeling , 2018, EMNLP.

[58]  Lane Schwartz,et al.  Unsupervised Grammar Induction with Depth-bounded PCFG , 2018, TACL.

[59]  Dan Klein,et al.  Constituency Parsing with a Self-Attentive Encoder , 2018, ACL.

[60]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[61]  Chris Dyer,et al.  Unsupervised Word Discovery with Segmental Neural Language Models , 2018, ArXiv.

[62]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[63]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[64]  Valentin I. Spitkovsky,et al.  Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction , 2013, EMNLP.

[65]  Alexander M. Rush,et al.  Latent Alignment and Variational Attention , 2018, NeurIPS.

[66]  Yang Liu,et al.  Learning Structured Text Representations , 2017, TACL.

[67]  Armand Joulin,et al.  Cooperative Learning of Disjoint Syntax and Semantics , 2019, NAACL.

[68]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[69]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[70]  Yang Liu,et al.  Structured Alignment Networks for Matching Sentences , 2018, EMNLP.

[71]  Jihun Choi,et al.  Learning to Compose Task-Specific Tree Structures , 2017, AAAI.

[72]  Tal Linzen,et al.  Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.

[73]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[74]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[75]  Rebecca Hwa Supervised Grammar Induction using Training Data with Limited Constituent Information , 1999, ACL.

[76]  Wang Ling,et al.  Learning to Compose Words into Sentences with Reinforcement Learning , 2016, ICLR.

[77]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[78]  John C. Trueswell,et al.  Learning to parse and its implications for language acquisition , 2007 .

[79]  Noah A. Smith,et al.  Annealing Techniques For Unsupervised Statistical Language Learning , 2004, ACL.

[80]  Samuel R. Bowman,et al.  Grammar Induction with Neural Language Models: An Unusual Replication , 2018, EMNLP.

[81]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[82]  Mohit Yadav,et al.  Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders , 2019, NAACL.

[83]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[84]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[85]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[86]  Alexander M. Rush,et al.  Learning Neural Templates for Text Generation , 2018, EMNLP.

[87]  Zhifei Li,et al.  First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests , 2009, EMNLP.

[88]  Lei Li,et al.  On Tree-Based Neural Sentence Modeling , 2018, EMNLP.

[89]  Rens Bod,et al.  An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[90]  Ahmad Emami,et al.  A Neural Syntactic Language Model , 2005, Machine Learning.

[91]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[92]  Graham Neubig,et al.  StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing , 2018, ACL.

[93]  Dan Klein,et al.  A Minimal Span-Based Neural Constituency Parser , 2017, ACL.