A “Density-Based” Algorithm for Cluster Analysis Using Species Sampling Gaussian Mixture Models

We propose a new model for cluster analysis in a Bayesian nonparametric framework. Our model combines two ingredients, species sampling mixture models of Gaussian distributions on one hand, and a deterministic clustering procedure (DBSCAN) on the other. Here, two observations from the underlying species sampling mixture model share the same cluster if the distance between the densities corresponding to their latent parameters is smaller than a threshold; this yields a random partition which is coarser than the one induced by the species sampling mixture. Since this procedure depends on the value of the threshold, we suggest a strategy to fix it. In addition, we discuss implementation and applications of the model; comparison with more standard clustering algorithms will be given as well. Supplementary materials for the article are available online.

[1]  A. Gelfand,et al.  The Nested Dirichlet Process , 2008 .

[2]  Hong Chang,et al.  Model Determination Using Predictive Distributions with Implementation via Sampling-Based Methods , 1992 .

[3]  Jim E. Griffin,et al.  Default priors for density estimation with mixture models , 2010 .

[4]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[5]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[6]  Zhaohui S. Qin,et al.  Clustering microarray gene expression data using weighted Chinese restaurant process , 2006, Bioinform..

[7]  Ramsés H. Mena,et al.  Controlling the reinforcement in Bayesian non‐parametric mixture models , 2007 .

[8]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[9]  W. Ewens The sampling theory of selectively neutral alleles. , 1972, Theoretical population biology.

[10]  P. Green,et al.  Bayesian Model-Based Clustering Procedures , 2007 .

[11]  F. Quintana,et al.  Bayesian clustering and product partition models , 2003 .

[12]  D. B. Dahl Modal clustering in a class of product partition models , 2009 .

[13]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[14]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[15]  Raffaele Argiento,et al.  A semiparametric Bayesian generalized linear mixed model for the reliability of Kevlar fibers , 2012 .

[16]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification , 1995, SSD.

[17]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[18]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[19]  P. Müller,et al.  Defining Predictive Probability Functions for Species Sampling Models. , 2013, Statistical science : a review journal of the Institute of Mathematical Statistics.

[20]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[21]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[22]  M. Cugmas,et al.  On comparing partitions , 2015 .

[23]  D. Binder Bayesian cluster analysis , 1978 .

[24]  A. Pievatolo,et al.  A comparison of nonparametric priors in hierarchical mixture modelling for AFT regression , 2009 .

[25]  A. Lijoi,et al.  Distributional results for means of normalized random measures with independent increments , 2003 .

[26]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[27]  K. Ickstadt,et al.  Improved criteria for clustering based on the posterior similarity matrix , 2009 .

[28]  Raffaele Argiento,et al.  Bayesian density estimation and model selection using nonparametric hierarchical mixtures , 2010, Comput. Stat. Data Anal..

[29]  Ka Yee Yeung,et al.  Bayesian mixture model based clustering of replicated microarray data , 2004, Bioinform..

[30]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[31]  J. Pitman Some developments of the Blackwell-MacQueen urn scheme , 1996 .