Type-Based MCMC

Most existing algorithms for learning latent-variable models---such as EM and existing Gibbs samplers---are token-based, meaning that they update the variables associated with one sentence at a time. The incremental nature of these methods makes them susceptible to local optima/slow mixing. In this paper, we introduce a type-based sampler, which updates a block of variables, identified by a type, which spans multiple sentences. We show improvements on part-of-speech induction, word segmentation, and learning tree-substitution grammars.

[1]  John DeNero,et al.  Efficient Parsing for Transducer Grammars , 2009, HLT-NAACL.

[2]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[3]  Jianfeng Gao,et al.  A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers , 2008, EMNLP.

[4]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[5]  Wang,et al.  Nonuniversal critical dynamics in Monte Carlo simulations. , 1987, Physical review letters.

[6]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[7]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[8]  Matt Post,et al.  Bayesian Learning of a Tree Substitution Grammar , 2009, ACL.

[9]  Ben Taskar,et al.  A permutation-augmented sampler for DP mixture models , 2007, ICML '07.

[10]  J. Wolff Learning Syntax and Meanings Through Optimization and Distributional Analysis , 1988 .

[11]  D. B. Dahl An improved merge-split sampler for conjugate dirichlet process mixture models , 2003 .

[12]  Phil Blunsom,et al.  Inducing Compact but Accurate Tree-Substitution Grammars , 2009, NAACL.

[13]  T. Griffiths,et al.  A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[14]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[15]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.