Exploiting the Wisdom of Crowd: A Multi-granularity Approach to Clustering Ensemble

There are three levels of granularity in a clustering ensemble system, namely, base clusterings, clusters, and instances. In this paper, we propose a novel clustering ensemble approach which integrates information from different levels of granularity into a unified graph model. The normalized crowd agreement index (NCAI) is presented for estimating the quality of base clusterings in an unsupervised manner. The source aware connected triple (SACT) method is proposed for inter-cluster link analysis. By treating the clusters and the instances altogether as nodes, we formulate the ensemble of base clusterings and multiple levels of relationship among them into a bipartite graph. The final consensus clustering is obtained via an efficient graph partitioning algorithm. Experiments are conducted on four real-world datasets from UCI Machine Learning Repository. Experimental results demonstrate the effectiveness of our approach for solving the clustering ensemble problem.

[1]  Surajit Ray,et al.  A Nonparametric Statistical Approach to Clustering via Mode Identification , 2007, J. Mach. Learn. Res..

[2]  James Surowiecki The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations Doubleday Books. , 2004 .

[3]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[4]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[5]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[6]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[7]  Erkki Oja,et al.  Rival penalized competitive learning for clustering analysis, RBF net, and curve detection , 1993, IEEE Trans. Neural Networks.

[8]  Shih-Fu Chang,et al.  Segmentation using superpixels: A bipartite graph partitioning approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Xi Wang,et al.  Clustering aggregation by probability accumulation , 2009, Pattern Recognit..

[10]  Tossapon Boongoen,et al.  Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations , 2008, Discovery Science.

[11]  Ricardo J. G. B. Campello,et al.  On the Comparison of Relative Clustering Validity Criteria , 2009, SDM.

[12]  Tommy W. S. Chow,et al.  Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density , 2004, Pattern Recognit..

[13]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.