A Self-Supervised Framework for Clustering Ensemble

Clustering ensemble refers to combine a number of base clusterings for a particular data set into a consensus clustering solution. In this paper, we propose a novel self-supervised learning framework for clustering ensemble. Specifically, we treat the base clusterings as pseudo class labels and learn classifiers for each of them. By adding priors to the parameters of these classifiers, we capture the relationships between different base clusterings and meanwhile obtain a a single consolidated clustering result. In the proposed framework, we are able to incorporate the original data features to improve the performance of clustering ensemble. Another advantage, which distinguishes the proposed framework from the traditional clustering ensemble approaches, is with the generalization capability, i.e. it is able to assign the incoming data instances to the consensus clusters directly based on the original data features. We conduct extensive experiments on multiple real world data sets to show the effectiveness of our method.

[1]  Arindam Banerjee,et al.  Bayesian cluster ensembles , 2011, Stat. Anal. Data Min..

[2]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[3]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[4]  Aristides Gionis,et al.  Clustering Aggregation (long version) , 2007 .

[5]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[6]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[7]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[8]  Jill P. Mesirov,et al.  A resampling-based method for class discovery and visualization of gene expression microarray data , 2003 .

[9]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[10]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[11]  Carlotta Domeniconi,et al.  Weighted Clustering Ensembles , 2006, SDM.

[12]  Chris H. Q. Ding,et al.  Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[13]  Xuan Li,et al.  Cluster Ensembles via Weighted Graph Regularized Nonnegative Matrix Factorization , 2011, ADMA.

[14]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[15]  Chris H. Q. Ding,et al.  Weighted Consensus Clustering , 2008, SDM.

[16]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[17]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.