A generalized Bayes framework for probabilistic clustering

Loss-based clustering methods, such as k-means and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixture models provides an alternative, but such methods face computational problems and large sensitivity to the choice of kernel. This article proposes a generalized Bayes framework that bridges between these two paradigms through the use of Gibbs posteriors. In conducting Bayesian updating, the log likelihood is replaced by a loss function for clustering, leading to a rich family of clustering methods. The Gibbs posterior represents a coherent updating of Bayesian beliefs without needing to specify a likelihood for the data, and can be used for characterizing uncertainty in clustering. We consider losses based on Bregman divergence and pairwise similarities, and develop efficient deterministic algorithms for point estimation along with sampling algorithms for uncertainty quantification. Several existing clustering algorithms, including k-means, can be interpreted as generalized Bayes estimators under our framework, and hence we provide a method of uncertainty quantification for these approaches.

[1]  David B. Dunson,et al.  Bayesian Distance Clustering , 2018, J. Mach. Learn. Res..

[2]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[3]  Tommaso Rigon,et al.  The Pitman–Yor multinomial process for mixture modelling , 2020 .

[4]  Pier Giovanni Bissiri,et al.  A general framework for updating belief distributions , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[5]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[6]  K. Ickstadt,et al.  Improved criteria for clustering based on the posterior similarity matrix , 2009 .

[7]  M. Stephens Dealing with label switching in mixture models , 2000 .

[8]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[9]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[10]  David B. Dunson,et al.  Composite mixture of log-linear models for categorical data , 2020, 2004.01462.

[11]  Stephen G. Walker,et al.  Univariate Bayesian nonparametric mixture modeling with unimodal kernels , 2014, Stat. Comput..

[12]  Lianming Wang,et al.  Fast Bayesian Inference in Dirichlet Process Mixture Models , 2011, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[13]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[14]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[15]  John Shawe-Taylor,et al.  A PAC analysis of a Bayesian estimator , 1997, COLT '97.

[16]  Peter Müller,et al.  A Product Partition Model With Regression on Covariates , 2011, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[17]  P. Green,et al.  Bayesian Model-Based Clustering Procedures , 2007 .

[18]  Ka Yee Yeung,et al.  Bayesian mixture model based clustering of replicated microarray data , 2004, Bioinform..

[19]  F. Krauss Latent Structure Analysis , 1980 .

[20]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[21]  Subhash R. Lele,et al.  A composite likelihood approach to (co)variance components estimation , 2002 .

[22]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[23]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[24]  M. Tanner,et al.  Gibbs posterior for variable selection in high-dimensional classification and data mining , 2008, 0810.5655.

[25]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[26]  Michael I. Jordan,et al.  MAD-Bayes: MAP-based Asymptotic Derivations from Bayes , 2012, ICML.

[27]  Michael I. Jordan,et al.  Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.

[28]  Christian P. Robert,et al.  Handbook of Mixture Analysis , 2018 .

[29]  D. Dunson,et al.  BAYESIAN GENERALIZED PRODUCT PARTITION MODEL , 2010 .

[30]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[31]  C. Holmes,et al.  Assigning a value to a power likelihood in a general Bayesian model , 2017, 1701.08515.

[32]  Zoubin Ghahramani,et al.  Bayesian Cluster Analysis: Point Estimation and Credible Balls (with Discussion) , 2015, Bayesian Analysis.

[33]  Cliburn Chan,et al.  Coarsened mixtures of hierarchical skew normal kernels for flow cytometry analyses , 2020 .

[34]  Laura Ventura,et al.  Bayesian composite marginal likelihoods , 2011 .

[35]  David B. Dunson,et al.  Robust Bayesian Inference via Coarsening , 2015, Journal of the American Statistical Association.

[36]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[37]  Arjun K. Gupta,et al.  Lp-norm spherical distribution , 1997 .

[38]  B. Jørgensen Exponential Dispersion Models , 1987 .

[39]  F. Quintana,et al.  Bayesian clustering and product partition models , 2003 .

[40]  Yuan Ji,et al.  Bayesian nonparametric clustering for large data sets , 2019, Stat. Comput..

[41]  S. Kotz,et al.  Symmetric Multivariate and Related Distributions , 1989 .