Simple approximate MAP inference for Dirichlet processes mixtures

The Dirichlet process mixture model (DPMM) is a ubiquitous, flexible Bayesian nonparametric statistical model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibbs sampling are required. As a result, DPMM-based methods, which have considerable potential, are restricted to applications in which computational resources and time for inference is plentiful. For example, they would not be practical for digital signal processing on embedded hardware, where computational resources are at a serious premium. Here, we develop a simplified yet statistically rigorous approximate maximum a-posteriori (MAP) inference algorithm for DPMMs. This algorithm is as simple as DP-means clustering, solves the MAP problem as well as Gibbs sampling, while requiring only a fraction of the computational effort. (For freely available code that implements the MAP-DP algorithm for Gaussian mixtures see http://www.maxlittle.net/.) Unlike related small variance asymptotics (SVA), our method is non-degenerate and so inherits the “rich get richer” property of the Dirichlet process. It also retains a non-degenerate closed-form likelihood which enables out-of-sample calculations and the use of standard tools such as cross-validation. We illustrate the benefits of our algorithm on a range of examples and contrast it to variational, SVA and sampling approaches from both a computational complexity perspective as well as in terms of clustering performance. We demonstrate the wide applicabiity of our approach by presenting an approximate MAP inference method for the infinite hidden Markov model whose performance contrasts favorably with a recently proposed hybrid SVA approach. Similarly, we show how our algorithm can applied to a semiparametric mixed-effects regression model where the random effects distribution is modelled using an infinite mixture model, as used in longitudinal progression modelling in population health science. Finally, we propose directions for future research on approximate MAP inference in Bayesian nonparametrics.

[1]  D. Blackwell Conditional Expectation and Unbiased Sequential Estimation , 1947 .

[2]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[3]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[4]  Josef Kittler,et al.  Contextual classification of multispectral pixel data , 1984, Image Vis. Comput..

[5]  A. Raftery,et al.  How Many Iterations in the Gibbs Sampler , 1991 .

[6]  J. Pitman Exchangeable and partially exchangeable random partitions , 1995 .

[7]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[8]  J G Ibrahim,et al.  A semiparametric Bayesian approach to the random effects model. , 1998, Biometrics.

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[11]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[12]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[13]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[14]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[15]  Michael I. Jordan,et al.  Variational methods for the Dirichlet process , 2004, ICML.

[16]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[17]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[18]  R. Wiggins,et al.  Quality of life at older ages: evidence from the English longitudinal study of aging (wave 1) , 2006, Journal of Epidemiology and Community Health.

[19]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[20]  Yee Whye Teh,et al.  Collapsed Variational Inference for HDP , 2007, NIPS.

[21]  Hal Daumé,et al.  Fast search for Dirichlet process mixture models , 2007, AISTATS.

[22]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[23]  Yee Whye Teh,et al.  Beam sampling for the infinite hidden Markov model , 2008, ICML '08.

[24]  A. Gelfand,et al.  The Nested Dirichlet Process , 2008 .

[25]  Max Welling,et al.  Bayesian k-Means as a Maximization-Expectation Algorithm , 2009, Neural Computation.

[26]  D. B. Dahl Modal clustering in a class of product partition models , 2009 .

[27]  David B. Dunson,et al.  Bayesian Nonparametrics: Nonparametric Bayes applications to biostatistics , 2010 .

[28]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[29]  Chong Wang,et al.  Online Variational Inference for the Hierarchical Dirichlet Process , 2011, AISTATS.

[30]  Lianming Wang,et al.  Fast Bayesian Inference in Dirichlet Process Mixture Models , 2011, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[31]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[32]  Michael I. Jordan,et al.  Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models , 2012, NIPS.

[33]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[34]  Jurgen Van Gael Bayesian nonparametric hidden Markov models , 2012 .

[35]  Michael I. Jordan,et al.  Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.

[36]  Matthew T. Harrison,et al.  A simple example of Dirichlet process mixture inconsistency for the number of components , 2013, NIPS.

[37]  Ke Jiang,et al.  Small-Variance Asymptotics for Hidden Markov Models , 2013, NIPS.

[38]  Erik B. Sudderth,et al.  Memoized Online Variational Inference for Dirichlet Process Mixture Models , 2013, NIPS.

[39]  Michael I. Jordan,et al.  MAD-Bayes: MAP-based Asymptotic Derivations from Bayes , 2012, ICML.

[40]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[41]  Max A. Little,et al.  Simple approximate MAP Inference for Dirichlet processes , 2014, 1411.0939.

[42]  C. Yau,et al.  A Sequential Algorithm for Fast Fitting of Dirichlet Process Mixture Models , 2013, 1301.2897.

[43]  Erik B. Sudderth,et al.  Reliable and Scalable Variational Inference for the Hierarchical Dirichlet Process , 2015, AISTATS.