GIBBS SAMPLING FOR THE UNINITIATED

This document is intended for computer scientists who would like to try out a Markov Chain Monte Carlo (MCMC) technique, particularly in order to do inference with Bayesian models on problems related to text processing. We try to keep theory to the absolute minimum needed, though we work through the details much more explicitly than you usually see even in \introductory" explanations. That means we’ve attempted to be ridiculously explicit in our exposition and notation. After providing the reasons and reasoning behind Gibbs sampling (and at least nodding our heads in the direction of theory), we work through an example application in detail|the derivation of a Gibbs sampler for

[1]  Wei-Hao Lin,et al.  Which Side are You on? Identifying Perspectives at the Document and Sentence Levels , 2006, CoNLL.

[2]  Hans Lohninger,et al.  Teach/Me - Data Analysis , 1999 .

[3]  Ted Pedersen,et al.  Knowledge Lean Word-Sense Disambiguation , 1997, AAAI/IAAI.

[4]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[5]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[6]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[7]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[8]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[9]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[10]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[11]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[12]  Ted Pedersen,et al.  Learning Probabilistic Models of Word Sense Disambiguation , 2007, ArXiv.

[13]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[14]  Mark Johnson,et al.  Nonparametric bayesian models of lexical acquisition , 2007 .

[15]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[16]  References , 1971 .

[17]  David Haussler,et al.  RNA Modeling Using Gibbs Sampling and Stochastic Context Free Grammars , 1994, ISMB.

[18]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[19]  Andrew McCallum,et al.  Bayesian Modeling of Dependency Trees Using Hierarchical Pitman-Yor Priors , 2008 .

[20]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.