Online Adaptor Grammars with Hybrid Inference

Adaptor grammars are a flexible, powerful formalism for defining nonparametric, unsupervised models of grammar productions. This flexibility comes at the cost of expensive inference. We address the difficulty of inference through an online algorithm which uses a hybrid of Markov chain Monte Carlo and variational inference. We show that this inference strategy improves scalability without sacrificing performance on unsupervised word segmentation and topic modeling tasks.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[3]  Naonori Ueda,et al.  Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling , 2009, ACL.

[4]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[5]  Mark Johnson,et al.  Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars , 2009, NAACL.

[6]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[7]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[8]  Chong Wang,et al.  Truncation-free Online Variational Inference for Bayesian Nonparametric Models , 2012, NIPS.

[9]  WagnerWiebke Steven Bird, Ewan Klein and Edward Loper , 2010, LREC 2010.

[10]  T. A. Cartwright,et al.  Distributional regularity and phonotactic constraints are useful for segmentation , 1996, Cognition.

[11]  Philip Resnik,et al.  Modeling Perspective Using Adaptor Grammars , 2010, EMNLP.

[12]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[13]  David M. Blei,et al.  Connections between the lines: augmenting social networks with text , 2009, KDD.

[14]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[15]  Mark Johnson,et al.  PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names , 2010, ACL.

[16]  Hiroyuki Shindo,et al.  Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing , 2012, ACL.

[17]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[18]  Jean-Cédric Chappelier,et al.  Monte-Carlo Sampling for NP-Hard Maximization Problems in the Framework of Weighted Parsing , 2000, Natural Language Processing.

[19]  Mark Johnson,et al.  A Bayesian LDA-based model for semi-supervised part-of-speech tagging , 2007, NIPS.

[20]  Michael I. Jordan,et al.  Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes , 2008, NIPS.

[21]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[22]  Thomas L. Griffiths,et al.  Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[23]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[24]  Thomas Emerson,et al.  The Second International Chinese Word Segmentation Bakeoff , 2005, IJCNLP.

[25]  Mark Johnson,et al.  Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation , 2012, ACL.

[26]  Jordan L. Boyd-Graber,et al.  Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce , 2012, WWW.

[27]  Thomas L. Griffiths,et al.  Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models , 2011, J. Mach. Learn. Res..

[28]  Ramesh Nallapati,et al.  Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[29]  David M. Blei,et al.  Sparse stochastic inference for latent Dirichlet allocation , 2012, ICML.

[30]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .

[31]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[32]  Yee Whye Teh,et al.  Collapsed Variational Dirichlet Process Mixture Models , 2007, IJCAI.

[33]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[34]  Noah A. Smith,et al.  Variational Inference for Adaptor Grammars , 2010, NAACL.

[35]  Ke Zhai,et al.  Discovering Latent Structure in Task-Oriented Dialogues , 2014, ACL.

[36]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[37]  Jordan Boyd-Graber,et al.  Online Latent Dirichlet Allocation with Infinite Vocabulary , 2013, ICML.

[38]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[39]  David Newman,et al.  External evaluation of topic models , 2009 .

[40]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .