Analysis of consensus partition in cluster ensemble

In combination of multiple partitions, one is usually interested in deriving a consensus solution with a quality better than that of given partitions. Several recent studies have empirically demonstrated improved accuracy of clustering ensembles on a number of artificial and real-world data sets. Unlike certain multiple supervised classifier systems, convergence properties of unsupervised clustering ensembles remain unknown for conventional combination schemes. In this paper, we present formal arguments on the effectiveness of cluster ensemble from two perspectives. The first is based on a stochastic partition generation model related to re-labeling and consensus function with plurality voting. The second is to study the property of the "mean" partition of an ensemble with respect to a metric on the space of all possible partitions. In both the cases, the consensus solution can be shown to converge to a true underlying clustering solution as the number of partitions in the ensemble increases. This paper provides a rigorous justification for the use of cluster ensemble.

[1]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[2]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[3]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[4]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[5]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[6]  Joachim M. Buhmann,et al.  Bagging for Path-Based Clustering , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[8]  Ana L. N. Fred,et al.  Finding Consistent Clusters in Data Partitions , 2001, Multiple Classifier Systems.

[9]  Mübeccel Demirekler,et al.  Plurality voting-based multiple classifier systems: statistically independent with respect to dependent classifier sets , 2002, Pattern Recognit..

[10]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[11]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[12]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[13]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[14]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[15]  Steven J. Simske,et al.  Performance analysis of pattern classifier combination by plurality voting , 2003, Pattern Recognit. Lett..