Compound Probabilistic Context-Free Grammars for Grammar Induction

We study a formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context free grammar. In contrast to traditional formulations which learn a single stochastic grammar, our context-free rule probabilities are modulated by a per-sentence continuous latent variable, which induces marginal dependencies beyond the traditional context-free assumptions. Inference in this context-dependent grammar is performed by collapsed variational inference, in which an amortized variational posterior is placed on the continuous variable, and the latent trees are marginalized with dynamic programming. Experiments on English and Chinese show the effectiveness of our approach compared to recent state-of-the-art methods for grammar induction from words with neural language models.

[1]  Dan Klein,et al.  A Minimal Span-Based Neural Constituency Parser , 2017, ACL.

[2]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[3]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[4]  Yang Liu,et al.  Structured Alignment Networks for Matching Sentences , 2018, EMNLP.

[5]  Baobao Chang,et al.  Graph-based Dependency Parsing with Bidirectional LSTM , 2016, ACL.

[6]  Cun-Hui Zhang,et al.  Compound decision theory and empirical bayes methods , 2003 .

[7]  J. Baker Trainable grammars for speech recognition , 1979 .

[8]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[9]  Tal Linzen,et al.  Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Dan Klein,et al.  Online EM for Unsupervised Models , 2009, NAACL.

[12]  H. Robbins Asymptotically Subminimax Solutions of Compound Statistical Decision Problems , 1985 .

[13]  Kewei Tu,et al.  Gaussian Mixture Latent Vector Grammars , 2018, ACL.

[14]  John Hale,et al.  LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better , 2018, ACL.

[15]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[16]  Kevin Gimpel,et al.  Controllable Paraphrase Generation with a Syntactic Exemplar , 2019, ACL.

[17]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[18]  John DeNero,et al.  A Feature-Rich Constituent Context Model for Grammar Induction , 2012, ACL.

[19]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[20]  Roger Levy,et al.  Structural Supervision Improves Learning of Non-Local Grammatical Dependencies , 2019, NAACL.

[21]  Karl Stratos,et al.  Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction , 2018, NAACL.

[22]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[23]  Karl Stratos,et al.  Spectral Learning of Latent-Variable PCFGs , 2012, ACL.

[24]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[25]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[26]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[27]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[28]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[29]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[30]  Dan Klein,et al.  Constituency Parsing with a Self-Attentive Encoder , 2018, ACL.

[31]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[32]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[34]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[35]  Valentin I. Spitkovsky,et al.  Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction , 2013, EMNLP.

[36]  Frank Keller,et al.  An Imitation Learning Approach to Unsupervised Parsing , 2019, ACL.

[37]  Mark Johnson,et al.  Using Left-corner Parsing to Encode Universal Structural Constraints in Grammar Induction , 2016, EMNLP.

[38]  Yoav Seginer,et al.  Fast Unsupervised Incremental Parsing , 2007, ACL.

[39]  Alfred V. Aho,et al.  Indexed Grammars—An Extension of Context-Free Grammars , 1967, SWAT.

[40]  Lane Schwartz,et al.  Unsupervised Grammar Induction with Depth-bounded PCFG , 2018, TACL.

[41]  Eric P. Xing,et al.  Spectral Unsupervised Parsing with Additive Tree Metrics , 2014, ACL.

[42]  Graham Neubig,et al.  Unsupervised Learning of Syntactic Structure with Invertible Neural Projections , 2018, EMNLP.

[43]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[44]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[45]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[46]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[47]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[48]  Aaron C. Courville,et al.  Neural Language Modeling by Jointly Learning Syntax and Lexicon , 2017, ICLR.

[49]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[50]  Ari Rappoport,et al.  Improved Fully Unsupervised Parsing with Zoomed Learning , 2010, EMNLP.

[51]  Graeme Hirst,et al.  Bayesian Analysis in Natural Language Processing , 2016, Computational Linguistics.

[52]  H. Robbins An Empirical Bayes Approach to Statistics , 1956 .

[53]  Regina Barzilay,et al.  Unsupervised Multilingual Grammar Induction , 2009, ACL.

[54]  Mohit Yadav,et al.  Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders , 2019, NAACL.

[55]  Noah A. Smith,et al.  Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction , 2009, NAACL.

[56]  Noah A. Smith,et al.  Annealing Techniques For Unsupervised Statistical Language Learning , 2004, ACL.

[57]  Samuel R. Bowman,et al.  Grammar Induction with Neural Language Models: An Unusual Replication , 2018, EMNLP.

[58]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[59]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[60]  Manaal Faruqui,et al.  Text Generation with Exemplar-based Adaptive Decoding , 2019, NAACL.

[61]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[62]  Min Zhang,et al.  Improved Constituent Context Model with Features , 2012, PACLIC.

[63]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[64]  Arian Maleki,et al.  Benefits of over-parameterization with EM , 2018, NeurIPS.

[65]  Noah A. Smith,et al.  Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction , 2008, NIPS.

[66]  Kevin Gimpel,et al.  A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations , 2019, NAACL.

[67]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[68]  Kevin Gimpel,et al.  Visually Grounded Neural Syntax Acquisition , 2019, ACL.

[69]  Dan Klein,et al.  Neural CRF Parsing , 2015, ACL.

[70]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[71]  Kenichi Kurihara,et al.  Variational Bayesian Grammar Induction for Natural Language , 2006, ICGI.

[72]  Barnabás Póczos,et al.  Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[73]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[74]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[75]  Sanjeev Arora,et al.  On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[76]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[77]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[78]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[79]  Hongyu Guo,et al.  Long Short-Term Memory Over Tree Structures , 2015, ArXiv.

[80]  Jason Eisner,et al.  Inside-Outside and Forward-Backward Algorithms Are Just Backprop (tutorial paper) , 2016, SPNLP@EMNLP.

[81]  Alexander M. Rush,et al.  Unsupervised Recurrent Neural Network Grammars , 2019, NAACL.

[82]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[83]  Chris Dyer,et al.  Unsupervised POS Induction with Word Embeddings , 2015, NAACL.

[84]  Valentin I. Spitkovsky,et al.  Three Dependency-and-Boundary Models for Grammar Induction , 2012, EMNLP.

[85]  Phil Blunsom,et al.  Collapsed Variational Bayesian Inference for PCFGs , 2013, CoNLL.

[86]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[87]  Alexander M. Rush,et al.  Learning Neural Templates for Text Generation , 2018, EMNLP.

[88]  Rens Bod,et al.  An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[89]  Daniel Marcu,et al.  Unsupervised Neural Hidden Markov Models , 2016, SPNLP@EMNLP.

[90]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[91]  Stephen Clark,et al.  Scalable Syntax-Aware Language Models Using Knowledge Distillation , 2019, ACL.