Ensemble clustering based on weighted co-association matrices: Error bound and convergence properties

Abstract We consider an approach to ensemble clustering based on weighted co-association matrices, where the weights are determined with some evaluation functions. Using a latent variable model of clustering ensemble, it is proved that, under certain assumptions, the clustering quality is improved with an increase in the ensemble size and the expectation of evaluation function. Analytical dependencies between the ensemble size and quality estimates are derived. Theoretical results are supported with numerical examples using Monte-Carlo modeling and segmentation of a real hyperspectral image under presence of noise channels.

[1]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[2]  Ana L. N. Fred,et al.  Analysis of consensus partition in cluster ensemble , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[3]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[4]  Joydeep Ghosh,et al.  Cluster ensembles , 2011, Data Clustering: Algorithms and Applications.

[5]  Vladimir B. Berikov A Latent Variable Pairwise Classification Model of a Clustering Ensemble , 2011, MCS.

[6]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  David G. Stork,et al.  Pattern Classification , 1973 .

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Chris H. Q. Ding,et al.  Weighted Consensus Clustering , 2008, SDM.

[10]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[11]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[12]  Andreas Stafylopatis,et al.  A clustering method based on boosting , 2004, Pattern Recognit. Lett..

[13]  Xi Wang,et al.  Clustering aggregation by probability accumulation , 2009, Pattern Recognit..

[14]  Sergio Greco,et al.  Diversity-Based Weighting Schemes for Clustering Ensembles , 2009, SDM.

[15]  Lars Schmidt-Thieme,et al.  GRAMOFON: General model-selection framework based on networks , 2012, Neurocomputing.

[16]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[17]  Tossapon Boongoen,et al.  Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations , 2008, Discovery Science.

[18]  Ludmila I. Kuncheva,et al.  Moderate diversity for better cluster ensembles , 2006, Inf. Fusion.

[19]  Jingsheng Lei,et al.  A clustering ensemble: Two-level-refined co-association matrix with path-based transformation , 2015, Pattern Recognit..

[20]  Xiaoli Z. Fern,et al.  Cluster Ensemble Selection , 2008 .

[21]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[22]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Cluster ensemble selection based on relative validity indexes , 2012, Data Mining and Knowledge Discovery.

[23]  Vladimir B. Berikov Weighted ensemble of algorithms for complex data clustering , 2014, Pattern Recognit. Lett..