Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations

Cluster ensemble methods have recently emerged as powerful techniques, aggregating several input data clusterings to generate a single output clustering, with improved robustness and stability. This paper presents two new similarity matrices, which are empirically evaluated and compared against the standard co-association matrix on six datasets (both artificial and real data) using four different combination methods and six clustering validity criteria. In all cases, the results suggest the new link-based similarity matrices are able to extract efficiently the information embedded in the input clusterings, and regularly suggest higher clustering quality in comparison to their competitor.

[1]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[2]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[3]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[4]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[5]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[7]  Nivio Ziviani,et al.  Link-based similarity measures for the classification of Web documents , 2006 .

[8]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[9]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[10]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[11]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[12]  Ludmila I. Kuncheva,et al.  Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[14]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment , 2007, Pattern Recognit. Lett..

[15]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[16]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[17]  Alexander Weber,et al.  Analysing Social Networks Within Bibliographical Data , 2006, DEXA.