Computing exact clustering posteriors with subset convolution

ABSTRACT An exponential-time exact algorithm is provided for the task of clustering n items of data into k clusters. Instead of seeking one partition, posterior probabilities are computed for summary statistics: the number of clusters and pairwise co-occurrence. The method is based on subset convolution and yields the posterior distribution for the number of clusters in O(n3n) operations or O(n32n) using fast subset convolution. Pairwise co-occurrence probabilities are then obtained in O(n32n) operations. This is considerably faster than exhaustive enumeration of all partitions.

[1]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[2]  L Knorr-Held,et al.  Bayesian Detection of Clusters and Discontinuities in Disease Maps , 2000, Biometrics.

[3]  M. Stephens Dealing with label switching in mixture models , 2000 .

[4]  Jukka Corander,et al.  BAPS 2: enhanced possibilities for the analysis of genetic population structure , 2004, Bioinform..

[5]  Fedor V. Fomin,et al.  Exact exponential algorithms , 2013, CACM.

[6]  M. Degroot Optimal Statistical Decisions , 1970 .

[7]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[8]  K J Dawson,et al.  A Bayesian approach to the identification of panmictic populations and the assignment of individuals. , 2001, Genetical research.

[9]  Andreas Björklund,et al.  Fourier meets möbius: fast subset convolution , 2006, STOC '07.

[10]  Robert E. Jensen,et al.  A Dynamic Programming Algorithm for Cluster Analysis , 1969, Oper. Res..

[11]  P. McCullagh Partition models , 2015 .

[12]  P. Green,et al.  Bayesian Model-Based Clustering Procedures , 2007 .

[13]  J. Hartigan,et al.  Product Partition Models for Change Point Problems , 1992 .

[14]  J. Huelsenbeck,et al.  Inference of Population Structure Under a Dirichlet Process Model , 2007, Genetics.

[15]  Phipps Arabie,et al.  Combinatorial Data Analysis: Optimization by Dynamic Programming , 1987 .

[16]  D. B. Dahl Modal clustering in a class of product partition models , 2009 .

[17]  Jacqueline J. Meulman,et al.  Improving Dynamic Programming Strategies for Partitioning , 2004, J. Classif..

[18]  F. Quintana,et al.  Bayesian clustering and product partition models , 2003 .

[19]  Mats Gyllenberg,et al.  Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy , 2009, Adv. Data Anal. Classif..