Bayesian Cluster Enumeration Criterion for Unsupervised Learning

We derive a new Bayesian Information Criterion (BIC) by formulating the problem of estimating the number of clusters in an observed dataset as maximization of the posterior probability of the candidate models. Given that some mild assumptions are satisfied, we provide a general BIC expression for a broad class of data distributions. This serves as a starting point when deriving the BIC for specific distributions. Along this line, we provide a closed-form BIC expression for multivariate Gaussian distributed variables. We show that incorporating the data structure of the clustering problem into the derivation of the BIC results in an expression whose penalty term is different from that of the original BIC. We propose a two-step cluster enumeration algorithm. First, a model-based unsupervised learning algorithm partitions the data according to a given set of candidate models. Subsequently, the number of clusters is determined as the one associated with the model for which the proposed BIC is maximal. The performance of the proposed two-step algorithm is tested using synthetic and real datasets.

[1]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[2]  Michael Muma,et al.  Bayesian Target Enumeration and Labeling Using Radar Data of Human Gait , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[3]  Pasi Fränti,et al.  Random swap EM algorithm for Gaussian mixture models , 2012, Pattern Recognit. Lett..

[4]  Y. Selen,et al.  Model-order selection: a review of information criterion rules , 2004, IEEE Signal Processing Magazine.

[5]  H. Akaike Statistical predictor identification , 1970 .

[6]  W. Krzanowski,et al.  A Criterion for Determining the Number of Groups in a Data Set Using Sum-of-Squares Clustering , 1988 .

[7]  Soosan Beheshti,et al.  Improving X-means clustering with MNDL , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).

[8]  Pasi Fränti,et al.  Iterative shrinking method for clustering problems , 2006, Pattern Recognit..

[9]  Pasi Fränti,et al.  Knee Point Detection on Bayesian Information Criterion , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[10]  Abdelhak M. Zoubir,et al.  Source Enumeration in Array Processing Using a Two-Step Test , 2015, IEEE Transactions on Signal Processing.

[11]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[12]  Argyris Kalogeratos,et al.  Dip-means: an incremental clustering method for estimating the number of clusters , 2012, NIPS.

[13]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[14]  N. Hjort,et al.  The Focused Information Criterion , 2003 .

[15]  Pasi Fränti Efficiency of random swap clustering , 2018, Journal of Big Data.

[16]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[17]  G. J. Babu,et al.  Three Types of Gamma-Ray Bursts , 1998, astro-ph/9802085.

[18]  Abdelhak M. Zoubir,et al.  Generalized Bayesian Information Criterion for Source Enumeration in Array Processing , 2013, IEEE Transactions on Signal Processing.

[19]  Michael Muma,et al.  Robust Estimation in Signal Processing , 2012 .

[20]  A. Raftery,et al.  Detecting features in spatial point processes with clutter via model-based clustering , 1998 .

[21]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[22]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[23]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[24]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[25]  Tsunenori Ishioka,et al.  An Expansion of X-Means for Automatically Determining the Optimal Number of Clusters a^EUR" Progressive Iterations of K-Means and Merging of the Clusters , 2005, Computational Intelligence.

[26]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[27]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[28]  Pasi Fränti,et al.  Knee Point Detection in BIC for Detecting the Number of Clusters , 2008, ACIVS.

[29]  Petar M. Djuric,et al.  Asymptotic MAP criteria for model selection , 1998, IEEE Trans. Signal Process..

[30]  Walter Zucchini,et al.  Model Selection , 2011, International Encyclopedia of Statistical Science.

[31]  Pasi Fränti,et al.  XNN Graph , 2016, S+SSPR.

[32]  J. Cavanaugh,et al.  Generalizing the derivation of the schwarz information criterion , 1999 .

[33]  B. G. Quinn,et al.  The determination of the order of an autoregression , 1979 .

[34]  Johannes Blömer,et al.  Adaptive Seeding for Gaussian Mixture Models , 2013, PAKDD.

[35]  J. Shao Bootstrap Model Selection , 1996 .

[36]  Aristidis Likas,et al.  Bayesian feature and model selection for Gaussian mixture models , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[38]  Babak Nadjar Araabi,et al.  Improved Bayesian information criterion for mixture model selection , 2016, Pattern Recognit. Lett..

[39]  Calyampudi R. Rao,et al.  A strongly consistent procedure for model selection in a regression problem , 1989 .

[40]  Adrian E. Raftery,et al.  Linear flaw detection in woven textiles using model-based clustering , 1997, Pattern Recognit. Lett..

[41]  Greg Hamerly,et al.  PG-means: learning the number of clusters in data , 2006, NIPS.

[42]  Pasi Fränti,et al.  A Dynamic local search algorithm for the clustering problem , 2002 .

[43]  H. Jeffreys A Treatise on Probability , 1922, Nature.

[44]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[45]  Tao Huang,et al.  Model Selection for Gaussian Mixture Models , 2013, 1301.3558.

[46]  Michael Muma,et al.  Gravitational Clustering: A Simple, Robust and Adaptive Approach for Distributed Networks , 2018, Signal Process..

[47]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[48]  R. Shibata Asymptotically Efficient Selection of the Order of the Model for Estimating Parameters of a Linear Process , 1980 .

[49]  Abdelhak M. Zoubir,et al.  Flexible Detection Criterion for Source Enumeration in Array Processing , 2013, IEEE Transactions on Signal Processing.

[50]  Michael Muma,et al.  Robust Statistics for Signal Processing , 2018 .

[51]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[52]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[53]  L. Breiman The Little Bootstrap and other Methods for Dimensionality Selection in Regression: X-Fixed Prediction Error , 1992 .

[54]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[55]  Abdelhak M. Zoubir,et al.  Bootstrap modeling of a class of nonstationary signals , 2000, IEEE Trans. Signal Process..

[56]  Mark R. Morelande,et al.  Model selection of random amplitude polynomial phase signals , 2002, IEEE Trans. Signal Process..

[57]  Abdelhak M. Zoubir,et al.  Detection of sources using bootstrap techniques , 2002, IEEE Trans. Signal Process..

[58]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[59]  Michael Muma,et al.  In-network adaptive cluster enumeration for distributed classification and labeling , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[60]  H. Akaike Fitting autoregressive models for prediction , 1969 .