Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application scenarios for the resultant 'knowledge reuse' framework that we call cluster ensembles. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information. In addition to a direct maximization approach, we propose three effective and efficient techniques for obtaining high-quality combiners (consensus functions). The first combiner induces a similarity measure from the partitionings and then reclusters the objects. The second combiner is based on hypergraph partitioning. The third one collapses groups of clusters into meta-clusters which then compete for each object to determine the combined clustering. Due to the low computational costs of our techniques, it is quite feasible to use a supra-consensus function that evaluates all three approaches against the objective function and picks the best solution for a given situation. We evaluate the effectiveness of cluster ensembles in three qualitatively different application scenarios: (i) where the original clusters were formed based on non-identical sets of features, (ii) where the original clustering algorithms worked on non-identical sets of objects, and (iii) where a common data-set is used and the main purpose of combining multiple clusterings is to improve the quality and robustness of the solution. Promising results are obtained in all three situations for synthetic as well as real data-sets.

[1]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[2]  Joydeep Ghosh,et al.  Relationship-Based Clustering and Visualization for High-Dimensional Data Mining , 2003, INFORMS J. Comput..

[3]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[4]  Joydeep Ghosh,et al.  Multiclassifier Systems: Back to the Future , 2002, Multiple Classifier Systems.

[5]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[6]  Hillol Kargupta,et al.  Distributed Clustering Using Collective Principal Component Analysis , 2001, Knowledge and Information Systems.

[7]  Branko Kavsek,et al.  Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction , 2001, ECML.

[8]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[9]  Philip K. Chan,et al.  Advances in Distributed and Parallel Knowledge Discovery , 2000 .

[10]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[11]  W. Scott Spangler,et al.  Clustering hypertext with applications to web searching , 2000, HYPERTEXT '00.

[12]  Vipin Kumar,et al.  Partitioning-based clustering for Web document categorization , 1999, Decis. Support Syst..

[13]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[14]  Joydeep Ghosh,et al.  Effective supra-classifiers for knowledge base construction , 1999, Pattern Recognit. Lett..

[15]  Hillol Kargupta,et al.  Collective, Hierarchical Clustering from Distributed, Heterogeneous Data , 1999, Large-Scale Parallel Data Mining.

[16]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[17]  Kagan Tumer,et al.  Linear and Order Statistics Combiners for Pattern Classification , 1999, ArXiv.

[18]  Amanda J. C. Sharkey,et al.  Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems , 1999 .

[19]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[20]  Paul S. Bradley,et al.  Initialization of Iterative Refinement Clustering Algorithms , 1998, KDD.

[21]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[22]  K. Bollacker,et al.  A Supra-Classifier Architecture for Scalable Knowledge Reuse , 1998, ICML.

[23]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[24]  S. Shekhar,et al.  Multilevel Hypergraph Partitioning: Application In Vlsi Domain , 1997, Proceedings of the 34th Design Automation Conference.

[25]  Amanda J. C. Sharkey,et al.  On Combining Artificial Neural Nets , 1996, Connect. Sci..

[26]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[27]  Daniel L. Silver,et al.  The Parallel Transfer of Task Knowledge Using Dynamic Learning Rates Based on a Measure of Relatedness , 1996, Connect. Sci..

[28]  Douglas H. Fisher,et al.  Iterative Optimization and Simplification of Hierarchical Clusterings , 1996, J. Artif. Intell. Res..

[29]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[30]  Andrew B. Kahng,et al.  Recent directions in netlist partitioning: a survey , 1995, Integr..

[31]  T. Kohonen Self-Organizing Maps , 1995, Springer Series in Information Sciences.

[32]  Sampath Kannan,et al.  Computing the local consensus of trees , 1995, SODA '95.

[33]  Joydeep Ghosh,et al.  Scale-based clustering using the radial basis function network , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[34]  Lorien Y. Pratt,et al.  Experiments on the transfer of knowledge between neural networks , 1994, COLT 1994.

[35]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[36]  J. Barthelemy,et al.  On the use of ordered sets in problems of comparison and consensus of classifications , 1986 .

[37]  D. A. Neumann,et al.  Clustering and isolation in the consensus problem for partitions , 1986 .

[38]  D. A. Neumann,et al.  On lattice consensus methods , 1986 .

[39]  Jeffrey A. Barnett,et al.  Computational Methods for a Mathematical Theory of Evidence , 1981, IJCAI.

[40]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[41]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[42]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[43]  George Karypis,et al.  Multilevel Hypergraph Partitioning , 2003 .

[44]  Joydeep Ghosh,et al.  A Consensus Framework for Integrating Distributed Clusterings Under Limited Knowledge Sharing , 2002 .

[45]  Joydeep Ghosh,et al.  Cluster Ensembles A Knowledge Reuse Framework for Combining Partitionings , 2002, AAAI/IAAI.

[46]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[47]  Joydeep Ghosh,et al.  A S alable Approa h to Balan ed, High-dimensional Clustering of Market-baskets , 2000 .

[48]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[49]  Rich Caruana,et al.  Learning Many Related Tasks at the Same Time with Backpropagation , 1994, NIPS.

[50]  Belur V. Dasarathy,et al.  Decision fusion , 1994 .

[51]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[52]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .