Hierarchical Dirichlet Processes

We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the well-known clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of a stick-breaking process, and a generalization of the Chinese restaurant process that we refer to as the “Chinese restaurant franchise.” We present Markov chain Monte Carlo algorithms for posterior inference in hierarchical Dirichlet process mixtures and describe applications to problems in information retrieval and text modeling.

[1]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[2]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[3]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  École d'été de probabilités de Saint-Flour,et al.  École d'été de probabilités de Saint-Flour XIII - 1983 , 1985 .

[6]  D. Aldous Exchangeability and related topics , 1985 .

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[9]  Jayaran Sethuramant A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[10]  Wolfgang L. Wendland,et al.  Variational Methods for BEM , 1991 .

[11]  W. Gilks,et al.  Adaptive Rejection Sampling for Gibbs Sampling , 1992 .

[12]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[13]  Radford M. Neal Bayesian Mixture Modeling , 1992 .

[14]  Pietro Muliere,et al.  A bayesian predictive approach to sequential search for an optimal dose: Parametric and nonparametric models , 1993 .

[15]  David J. C. MacKay,et al.  A hierarchical Dirichlet language model , 1995, Natural Language Engineering.

[16]  C. Eeden,et al.  On using a loss function in selecting the best of two gamma populations in terms of their scale parameters , 1995 .

[17]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[18]  J. Pitman Random discrete distributions invariant under size-biased permutation , 1996, Advances in Applied Probability.

[19]  J. Skilling,et al.  Bayesian Density Estimation , 1996 .

[20]  G. Robinson BOOK REVIEWSBOOK REVIEWS , 1997 .

[21]  B. Mallick,et al.  Combining information from several experiments with nonparametric priors , 1997 .

[22]  S. MacEachern,et al.  Estimating mixture of dirichlet process models , 1998 .

[23]  G. Tomlinson Analysis of densities , 1998 .

[24]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[25]  J G Ibrahim,et al.  A semi-parametric Bayesian approach to generalized linear mixed models. , 1998, Statistics in medicine.

[26]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[27]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[28]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[29]  Sebastian Thrun,et al.  Probabilistic Algorithms in Robotics , 2000, AI Mag..

[30]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[31]  Alan E. Gelfand,et al.  SPATIAL NONPARAMETRIC BAYESIAN MODELS , 2001 .

[32]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[33]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[34]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[35]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[36]  E. Davidson Genomic Regulatory Systems , 2001 .

[37]  P. Green,et al.  Modelling Heterogeneity With and Without the Dirichlet Process , 2001 .

[38]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[39]  H. Ishwaran,et al.  Exact and approximate sum representations for the Dirichlet process , 2002 .

[40]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[41]  Giovanni Parmigiani,et al.  Semiparametric regression for count data , 2002 .

[42]  Gary E. Bolton,et al.  Reanalyzing Ultimatum Bargaining—Comparing Nondecreasing Curves Without Shape Constraints , 2002 .

[43]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[44]  Jim Pitman,et al.  Poisson–Dirichlet and GEM Invariant Distributions for Split-and-Merge Transformations of an Interval Partition , 2002, Combinatorics, Probability and Computing.

[45]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[46]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[47]  Michael I. Jordan,et al.  Hierarchical Bayesian Models for Applications in Information Retrieval , 2003 .

[48]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[49]  S. MacEachern,et al.  An ANOVA Model for Dependent Random Measures , 2004 .

[50]  P. Müller,et al.  A method for combining inference across related nonparametric Bayesian models , 2004 .

[51]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[52]  Hemant Ishwaran,et al.  Computational Methods for Multiplicative Intensity Models Using Weighted Gamma Processes , 2004 .

[53]  J. Pitman Combinatorial Stochastic Processes , 2006 .