Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis

The clustering ensemble technique aims to combine multiple clusterings into a probably better and more robust clustering and has been receiving an increasing attention in recent years. There are mainly two aspects of limitations in the existing clustering ensemble approaches. Firstly, many approaches lack the ability to weight the base clusterings without access to the original data and can be affected significantly by the low-quality, or even ill clusterings. Secondly, they generally focus on the instance level or cluster level in the ensemble system and fail to integrate multi-granularity cues into a unified model. To address these two limitations, this paper proposes to solve the clustering ensemble problem via crowd agreement estimation and multi-granularity link analysis. We present the normalized crowd agreement index (NCAI) to evaluate the quality of base clusterings in an unsupervised manner and thus weight the base clusterings in accordance with their clustering validity. To explore the relationship between clusters, the source aware connected triple (SACT) similarity is introduced with regard to their common neighbors and the source reliability. Based on NCAI and multi-granularity information collected among base clusterings, clusters, and data instances, we further propose two novel consensus functions, termed weighted evidence accumulation clustering (WEAC) and graph partitioning with multi-granularity link analysis (GP-MGLA) respectively. The experiments are conducted on eight real-world datasets. The experimental results demonstrate the effectiveness and robustness of the proposed methods.

[1]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Multi-objective clustering ensemble for gene expression data analysis , 2009, Neurocomputing.

[3]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[4]  Tommy W. S. Chow,et al.  Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density , 2004, Pattern Recognit..

[5]  Sandro Vega-Pons,et al.  Weighted partition consensus via kernels , 2010, Pattern Recognit..

[6]  Frank Plastria,et al.  On the point for which the sum of the distances to n given points is minimum , 2009, Ann. Oper. Res..

[7]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Xi Wang,et al.  Clustering aggregation by probability accumulation , 2009, Pattern Recognit..

[9]  Chang-Dong Wang,et al.  Multi-Exemplar Affinity Propagation , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  James T. Kwok,et al.  Time and space efficient spectral clustering via column sampling , 2011, CVPR 2011.

[11]  Longin Jan Latecki,et al.  Clustering Aggregation as Maximum-Weight Independent Set , 2012, NIPS.

[12]  Chang-Dong Wang,et al.  Energy based competitive learning , 2011, Neurocomputing.

[13]  Erkki Oja,et al.  Rival penalized competitive learning for clustering analysis, RBF net, and curve detection , 1993, IEEE Trans. Neural Networks.

[14]  Shih-Fu Chang,et al.  Segmentation using superpixels: A bipartite graph partitioning approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Chris H. Q. Ding,et al.  Weighted Consensus Clustering , 2008, SDM.

[16]  DomeniconiCarlotta,et al.  Weighted cluster ensembles , 2009 .

[17]  Sandro Vega-Pons,et al.  Weighted association based methods for the combination of heterogeneous partitions , 2011, Pattern Recognit. Lett..

[18]  Jian Yu,et al.  Clustering Ensembles Based on Normalized Edges , 2007, PAKDD.

[19]  Chang-Dong Wang,et al.  Incremental support vector clustering with outlier detection , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[20]  Jinfeng Yi,et al.  Robust Ensemble Clustering by Matrix Completion , 2012, 2012 IEEE 12th International Conference on Data Mining.

[21]  James Surowiecki The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations Doubleday Books. , 2004 .

[22]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[23]  Chang-Dong Wang,et al.  Incremental Support Vector Clustering , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[24]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Xiaoyi Jiang,et al.  Ensemble clustering by means of clustering embedding in vector spaces , 2014, Pattern Recognit..

[26]  G LindsayBruce,et al.  A Nonparametric Statistical Approach to Clustering via Mode Identification , 2007 .

[27]  Dan A. Simovici,et al.  Finding Median Partitions Using Information-Theoretical-Based Genetic Algorithms , 2002, J. Univers. Comput. Sci..

[28]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[29]  Lei Xu,et al.  Rival penalized competitive learning , 2007, Scholarpedia.

[30]  Surajit Ray,et al.  A Nonparametric Statistical Approach to Clustering via Mode Identification , 2007, J. Mach. Learn. Res..

[31]  Zhi-Hua Zhou,et al.  Multi-instance clustering with applications to multi-instance prediction , 2009, Applied Intelligence.

[32]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[33]  Jingsheng Lei,et al.  A clustering ensemble: Two-level-refined co-association matrix with path-based transformation , 2015, Pattern Recognit..

[34]  Chang-Dong Wang,et al.  Position regularized Support Vector Domain Description , 2013, Pattern Recognit..

[35]  Chang-Dong Wang,et al.  Exploiting the Wisdom of Crowd: A Multi-granularity Approach to Clustering Ensemble , 2013, IScIDE.

[36]  黄栋,et al.  Exploiting the Wisdom of Crowd: A Multi-Granularity Approach to Clustering Ensemble , 2013 .

[37]  Maoguo Gong,et al.  Spectral clustering with eigenvector selection based on entropy ranking , 2010, Neurocomputing.

[38]  Tossapon Boongoen,et al.  Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations , 2008, Discovery Science.

[39]  Xiaoli Z. Fern,et al.  Cluster Ensemble Selection , 2008, Statistical analysis and data mining.

[40]  Ludmila I. Kuncheva,et al.  Moderate diversity for better cluster ensembles , 2006, Inf. Fusion.

[41]  Tossapon Boongoen,et al.  A Link-Based Approach to the Cluster Ensemble Problem , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  K JainAnil,et al.  Combining Multiple Clusterings Using Evidence Accumulation , 2005 .

[43]  Edmund A. Mennis The Wisdom of Crowds: Why the Many Are Smarter than the Few and How Collective Wisdom Shapes Business, Economies, Societies, and Nations , 2006 .

[44]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[45]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[46]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[47]  Ertunc Erdil,et al.  Combining multiple clusterings using similarity graph , 2011, Pattern Recognit..

[48]  Joydeep Ghosh,et al.  Cluster ensembles , 2011, Data Clustering: Algorithms and Applications.

[49]  Chang-Dong Wang,et al.  SVStream: A Support Vector-Based Algorithm for Clustering Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.