Grammar induction from (lots of) words alone

Grammar induction is the task of learning syntactic structure in a setting where that structure is hidden. Grammar induction from words alone is interesting because it is similiar to the problem that a child learning a language faces. Previous work has typically assumed richer but cognitively implausible input, such as POS tag annotated data, which makes that work less relevant to human language acquisition. We show that grammar induction from words alone is in fact feasible when the model is provided with sufficient training data, and present two new streaming or mini-batch algorithms for PCFG inference that can learn from millions of words of training data. We compare the performance of these algorithms to a batch algorithm that learns from less data. The minibatch algorithms outperform the batch algorithm, showing that cheap inference with more data is better than intensive inference with less data. Additionally, we show that the harmonic initialiser, which previous work identified as essential when learning from small POS-tag annotated corpora (Klein and Manning, 2004), is not superior to a uniform initialisation.

[1]  Phil Blunsom,et al.  Collapsed Variational Bayesian Inference for PCFGs , 2013, CoNLL.

[2]  Sharon Goldwater,et al.  Unsupervised Dependency Parsing with Acoustic Cues , 2013, Transactions of the Association for Computational Linguistics.

[3]  Giorgio Satta,et al.  Efficient Parsing for Bilexical Context-Free Grammars and Head Automaton Grammars , 1999, ACL.

[4]  Jason M. Brenier,et al.  Predictability Effects on Durations of Content and Function Words in Conversational English , 2009 .

[5]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[6]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[7]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[8]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[9]  Raymond J. Mooney,et al.  Generative Alignment and Semantic Parsing for Learning from Ambiguous Supervision , 2010, COLING.

[10]  S. Freytag Knowledge Of Language Its Nature Origin And Use , 2016 .

[11]  Phong Le,et al.  Unsupervised Dependency Parsing: Let’s Use Supervised Parsers , 2015, NAACL.

[12]  Yee Whye Teh,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2006, NIPS.

[13]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[14]  Mark Steedman,et al.  A Probabilistic Model of Syntactic and Semantic Acquisition from Child-Directed Utterances and their Meanings , 2012, EACL.

[15]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[16]  Valentin I. Spitkovsky,et al.  Unsupervised Dependency Parsing without Gold Part-of-Speech Tags , 2011, EMNLP.

[17]  Mark Johnson,et al.  PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names , 2010, ACL.

[18]  Yonatan Bisk,et al.  An HDP Model for Inducing Combinatory Categorial Grammars , 2013, TACL.

[19]  Mark Johnson,et al.  Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing , 2009, NAACL.

[20]  Mark Steedman,et al.  The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue , 2010, Lang. Resour. Evaluation.

[21]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[22]  Kenichi Kurihara,et al.  An Application of the Variational Bayesian Approach to Probabilistic Context-Free Grammars , 2004 .

[23]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[24]  Mark Johnson,et al.  Reducing Grounded Learning Tasks To Grammatical Inference , 2011, EMNLP.

[25]  Mark Johnson,et al.  Transforming Projective Bilexical Dependency Grammars into efficiently-parsable CFGs with Unfold-Fold , 2007, ACL.

[26]  Mark Johnson,et al.  Joint Incremental Disfluency Detection and Dependency Parsing , 2014, TACL.