Unsupervised Learning of PCFGs with Normalizing Flow

Unsupervised PCFG inducers hypothesize sets of compact context-free rules as explanations for sentences. PCFG induction not only provides tools for low-resource languages, but also plays an important role in modeling language acquisition (Bannard et al., 2009; Abend et al. 2017). However, current PCFG induction models, using word tokens as input, are unable to incorporate semantics and morphology into induction, and may encounter issues of sparse vocabulary when facing morphologically rich languages. This paper describes a neural PCFG inducer which employs context embeddings (Peters et al., 2018) in a normalizing flow model (Dinh et al., 2015) to extend PCFG induction to use semantic and morphological information. Linguistically motivated sparsity and categorical distance constraints are imposed on the inducer as regularization. Experiments show that the PCFG induction model with normalizing flow produces grammars with state-of-the-art accuracy on a variety of different languages. Ablation further shows a positive effect of normalizing flow, context embeddings and proposed regularizers.

[1]  Kewei Tu,et al.  Unsupervised learning of probabilistic grammars , 2012 .

[2]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[3]  Michael Collins,et al.  A Statistical Parser for Czech , 1999, ACL.

[4]  Rens Bod,et al.  Unsupervised Parsing with U-DOP , 2006, CoNLL.

[5]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[6]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[7]  Yonatan Bisk,et al.  Probing the Linguistic Strengths and Limitations of Unsupervised Grammar Induction , 2015, ACL.

[8]  Dan Klein,et al.  Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[9]  Yijia Liu,et al.  Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation , 2018, CoNLL.

[10]  Z. Vendler,et al.  Res Cogitans: An Essay in Rational Psychology , 1972 .

[11]  Mark Steedman,et al.  Two Decades of Unsupervised POS Induction: How Far Have We Come? , 2010, EMNLP.

[12]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[13]  Lane Schwartz,et al.  Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction , 2018, EMNLP.

[14]  M. Tomasello,et al.  Modeling children's early grammatical knowledge , 2009, Proceedings of the National Academy of Sciences.

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  Nathaniel J. Smith,et al.  Bootstrapping language acquisition , 2017, Cognition.

[17]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[18]  Lane Schwartz,et al.  Memory-Bounded Left-Corner Unsupervised Grammar Induction on Child-Directed Input , 2016, COLING.

[19]  Lane Schwartz,et al.  Unsupervised Grammar Induction with Depth-bounded PCFG , 2018, TACL.

[20]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[22]  Jason Baldridge,et al.  Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models , 2011, ACL.

[23]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[24]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[25]  J. Oates,et al.  Cognitive and language development in children , 2004 .

[26]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[27]  Graham Neubig,et al.  Unsupervised Learning of Syntactic Structure with Invertible Neural Projections , 2018, EMNLP.

[28]  Chris Dyer,et al.  Unsupervised POS Induction with Word Embeddings , 2015, NAACL.

[29]  Nianwen Xue,et al.  Developing Guidelines and Ensuring Consistency for Chinese Text Annotation , 2000, LREC.

[30]  Kewei Tu,et al.  Unsupervised Neural Dependency Parsing , 2016, EMNLP.

[31]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[32]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  B. MacWhinney,et al.  The Crosslinguistic Study of Sentence Processing. , 1992 .

[35]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[36]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[37]  Yoav Seginer,et al.  Fast Unsupervised Incremental Parsing , 2007, ACL.

[38]  Aaron C. Courville,et al.  Neural Language Modeling by Jointly Learning Syntax and Lexicon , 2017, ICLR.

[39]  Wojciech Skut,et al.  A Linguistically Interpreted Corpus of German Newspaper Text , 1998, LREC.

[40]  Cynthia Fisher,et al.  On the semantic content of subcategorization frames , 1991, Cognitive Psychology.

[41]  Daniel Marcu,et al.  Unsupervised Neural Hidden Markov Models , 2016, SPNLP@EMNLP.

[42]  Samuel R. Bowman,et al.  Grammar Induction with Neural Language Models: An Unusual Replication , 2018, EMNLP.