Constraint Co-Projections for Semi-Supervised Co-Clustering

Co-clustering aims to simultaneously cluster the objects and features to explore intercorrelated patterns. However, it is usually difficult to obtain good co-clustering results by just analyzing the object-feature correlation data due to the sparsity of the data and the noise. Meanwhile, most co-clustering algorithms cannot take the prior information into consideration and may produce unmeaningful results. Semi-supervised co-clustering aims to incorporate the known prior knowledge into the co-clustering algorithm. In this paper, a new technique named constraint co-projections for semi-supervised co-clustering (CPSSCC) is presented. Constraint co-projections can not only make use of two popular techniques including pairwise constraints and constraint projections, but also simultaneously perform the object constraint projections and feature constraint projections. The two popular techniques are illustrated for semi-supervised co-clustering when some objects and features are believed to be in the same cluster a priori. Furthermore, we also prove that the co-clustering problem can be formulated as a typical eigen-problem and can be efficiently solved with the selected eigenvectors. To the best of our knowledge, constraint co-projections is first stated in this paper and this is the first work on using CPSSCC. Extensive experiments on benchmark data sets demonstrate the effectiveness of the proposed method. This paper also shows that CPSSCC has some favorable features compared with previous related co-clustering algorithms.

[1]  Weifu Chen,et al.  Spectral clustering: A semi-supervised approach , 2012, Neurocomputing.

[2]  Francesco Masulli,et al.  A Novel Approach for Biclustering Gene Expression Data Using Modular Singular Value Decomposition , 2009, CIBB.

[3]  Changsheng Xu,et al.  Joint Local and Global Consistency on Interdocument and Interword Relationships for Co-Clustering , 2015, IEEE Transactions on Cybernetics.

[4]  Sushmita Mitra,et al.  Multi-objective evolutionary biclustering of gene expression data , 2006, Pattern Recognit..

[5]  Arindam Banerjee,et al.  Bayesian Co-clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[6]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[7]  Yanhua Chen,et al.  Non-Negative Matrix Factorization for Semisupervised Heterogeneous Data Coclustering , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  Dayne Freitag,et al.  Towards Full Automation of Lexicon Construction , 2004, HLT-NAACL 2004.

[9]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[10]  Hui Xiong,et al.  Enhancing semi-supervised clustering: a feature projection perspective , 2007, KDD '07.

[11]  Nikos D. Sidiropoulos,et al.  From K-Means to Higher-Way Co-Clustering: Multilinear Decomposition With Sparse Latent Factors , 2013, IEEE Transactions on Signal Processing.

[12]  Furu Wei,et al.  Constrained Coclustering for Textual Documents , 2010, AAAI Conference on Artificial Intelligence.

[13]  Chun Chen,et al.  Relational Multimanifold Coclustering , 2013, IEEE Transactions on Cybernetics.

[14]  Deepak Agarwal,et al.  Predictive discrete latent factor models for large scale dyadic data , 2007, KDD '07.

[15]  Srujana Merugu,et al.  A scalable collaborative filtering framework based on co-clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[16]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[17]  Fei Wang,et al.  Learning Spectral Embedding for Semi-supervised Clustering , 2011, 2011 IEEE 11th International Conference on Data Mining.

[18]  Yiu-ming Cheung,et al.  Semi-Supervised Maximum Margin Clustering with Pairwise Constraints , 2012, IEEE Transactions on Knowledge and Data Engineering.

[19]  George Karypis,et al.  CLUTO - A Clustering Toolkit , 2002 .

[20]  Yang Yan,et al.  Fuzzy semi-supervised co-clustering for text documents , 2013, Fuzzy Sets Syst..

[21]  Philip S. Yu,et al.  Efficient Semi-supervised Spectral Co-clustering with Constraints , 2010, 2010 IEEE International Conference on Data Mining.

[22]  Fei Wang,et al.  Semi-Supervised Clustering via Matrix Factorization , 2008, SDM.

[23]  Qiang Yang,et al.  Semi-Supervised Learning with Very Few Labeled Training Examples , 2007, AAAI.

[24]  Furu Wei,et al.  Constrained co-clustering for textual documents , 2010, AAAI 2010.

[25]  Tomer Hertz,et al.  Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.

[26]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[27]  Chien-Liang Liu,et al.  Semi-Supervised Linear Discriminant Clustering , 2014, IEEE Transactions on Cybernetics.

[28]  Meng Wang,et al.  MSRA-MM 2.0: A Large-Scale Web Multimedia Dataset , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[29]  Zhenguo Li,et al.  Pairwise constraint propagation by semidefinite programming for semi-supervised classification , 2008, ICML '08.

[30]  Hong Yan,et al.  A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data. , 2008, Journal of theoretical biology.

[31]  Jing Peng,et al.  Composite kernels for semi-supervised clustering , 2011, Knowledge and Information Systems.

[32]  Tao Li,et al.  Hierarchical Co-Clustering: A New Way to Organize the Music Data , 2012, IEEE Transactions on Multimedia.

[33]  Cheng Wu,et al.  Semi-Supervised and Unsupervised Extreme Learning Machines , 2014, IEEE Transactions on Cybernetics.

[34]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[35]  Dong Xu,et al.  Semi-Supervised Heterogeneous Fusion for Multimedia Data Co-Clustering , 2014, IEEE Transactions on Knowledge and Data Engineering.

[36]  Furu Wei,et al.  Constrained Text Coclustering with Supervised and Unsupervised Constraints , 2013, IEEE Transactions on Knowledge and Data Engineering.

[37]  Daoqiang Zhang,et al.  Constraint Projections for Ensemble Learning , 2008, AAAI.

[38]  Fei Wang,et al.  Semisupervised Metric Learning by Maximizing Constraint Margin , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[39]  Inderjit S. Dhillon,et al.  Semi-supervised graph clustering: a kernel approach , 2005, ICML '05.

[40]  Min Wu,et al.  Multi-label ensemble based on variable pairwise constraint projection , 2013, Inf. Sci..

[41]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[42]  Inderjit S. Dhillon,et al.  A generalized maximum entropy approach to bregman co-clustering and matrix approximation , 2004, J. Mach. Learn. Res..

[43]  Tao Li,et al.  Constraint Neighborhood Projections for Semi-Supervised Clustering , 2014, IEEE Transactions on Cybernetics.