Using diversity in cluster ensembles

The pairwise approach to cluster ensembles uses multiple partitions, each of which constructs a coincidence matrix between all pairs of objects. The matrices for the partitions are then combined and a final clustering is derived thereof. Here we study the diversity within such cluster ensembles. Based on this, we propose a variant of the generic ensemble method where the number of overproduced clusters is chosen randomly for every ensemble member (partition). Using three artificial sets we show that this approach increases the spread of the diversity within the ensemble thereby leading to a better match with the known cluster labels. Experimental results with three real data sets are also reported.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[3]  Joydeep Ghosh,et al.  Multiclassifier Systems: Back to the Future , 2002, Multiple Classifier Systems.

[4]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[5]  Ana L. N. Fred,et al.  Finding Consistent Clusters in Data Partitions , 2001, Multiple Classifier Systems.

[6]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[7]  Kurt Hornik,et al.  An Ensemble Method for Clustering , 2003 .

[8]  Mohamed S. Kamel,et al.  Finding Natural Clusters Using Multi-clusterer Combiner Based on Shared Nearest Neighbors , 2003, Multiple Classifier Systems.

[9]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[10]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[11]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[12]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[13]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[14]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[15]  Joydeep Ghosh,et al.  Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings , 2002, AAAI/IAAI.

[16]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.