Clustering aggregation by probability accumulation

Since a large number of clustering algorithms exist, aggregating different clustered partitions into a single consolidated one to obtain better results has become an important problem. In Fred and Jain's evidence accumulation algorithm, they construct a co-association matrix on original partition labels, and then apply minimum spanning tree to this matrix for the combined clustering. In this paper, we will propose a novel clustering aggregation scheme, probability accumulation. In this algorithm, the construction of correlation matrices takes the cluster sizes of original clusterings into consideration. An alternate improved algorithm with additional pre- and post-processing is also proposed. Experimental results on both synthetic and real data-sets show that the proposed algorithms perform better than evidence accumulation, as well as some other methods.

[1]  Anil K. Jain,et al.  Large-Scale Parallel Data Clustering , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Claudio Carpineto,et al.  A Lattice Conceptual Clustering System and Its Application to Browsing Retrieval , 1996, Machine Learning.

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[5]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[6]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[7]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[8]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jitender S. Deogun,et al.  Conceptual clustering in information retrieval , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[12]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[13]  Hichem Frigui,et al.  A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Ana L. N. Fred,et al.  Evidence Accumulation Clustering Based on the K-Means Algorithm , 2002, SSPR/SPR.

[15]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..