Solving cluster ensemble problems by bipartite graph partitioning

A critical problem in cluster ensemble research is how to combine multiple clusterings to yield a final superior clustering result. Leveraging advanced graph partitioning techniques, we solve this problem by reducing it to a graph partitioning problem. We introduce a new reduction method that constructs a bipartite graph from a given cluster ensemble. The resulting graph models both instances and clusters of the ensemble simultaneously as vertices in the graph. Our approach retains all of the information provided by a given ensemble, allowing the similarity among instances and the similarity among clusters to be considered collectively in forming the final clustering. Further, the resulting graph partitioning problem can be solved efficiently. We empirically evaluate the proposed approach against two commonly used graph formulations and show that it is more robust and achieves comparable or better performance in comparison to its competitors.

[1]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[2]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[3]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[4]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Carla E. Brodley,et al.  The customized-queries approach to CBIR using EM , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[8]  Naftali Tishby,et al.  Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[9]  Kurt Hornik,et al.  Voting-Merging: An Ensemble Method for Clustering , 2001, ICANN.

[10]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[11]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[12]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[13]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[14]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[15]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[16]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[17]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[18]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[19]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[20]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.