论文信息 - Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations

Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations

Cluster ensemble methods have recently emerged as powerful techniques, aggregating several input data clusterings to generate a single output clustering, with improved robustness and stability. This paper presents two new similarity matrices, which are empirically evaluated and compared against the standard co-association matrix on six datasets (both artificial and real data) using four different combination methods and six clustering validity criteria. In all cases, the results suggest the new link-based similarity matrices are able to extract efficiently the information embedded in the input clusterings, and regularly suggest higher clustering quality in comparison to their competitor.

Tossapon Boongoen | Natthakan Iam-on | Simon M. Garrett

[1] Joydeep Ghosh,et al. Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[2] Carla E. Brodley,et al. Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[3] Wenfei Fan,et al. Keys with Upward Wildcards for XML , 2001, DEXA.

[4] D. Wolpert,et al. No Free Lunch Theorems for Search , 1995 .

[5] Ana L. N. Fred,et al. Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Aristides Gionis,et al. Clustering Aggregation , 2005, ICDE.

[7] Nivio Ziviani,et al. Link-based similarity measures for the classification of Web documents , 2006 .

[8] Anil K. Jain,et al. A Mixture Model for Clustering Ensembles , 2004, SDM.

[9] J. Dunn. Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[10] George Karypis,et al. Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[11] Carla E. Brodley,et al. Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[12] Ludmila I. Kuncheva,et al. Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Rich Caruana,et al. Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[14] Ricardo J. G. B. Campello,et al. A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment , 2007, Pattern Recognit. Lett..

[15] Jennifer Widom,et al. SimRank: a measure of structural-context similarity , 2002, KDD.

[16] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[17] Alexander Weber,et al. Analysing Social Networks Within Bibliographical Data , 2006, DEXA.