Co-Clustering Ensembles Based on Multiple Relevance Measures

Co-clustering aims at discovering groups of both objects and features from a given data matrix. Co-clustering ensembles can produce robust co-clusters by combining multiple base co-clusterings. However, current co-clustering ensemble solutions either ignore the constraints resulting from feature-to-feature and object-to-object relevance information, or ignore feature-to-object relevance information. In this paper, we advocate that all three information sources contribute to the achievement of good consensus solutions, and propose a co-clustering ensemble (CoCE) approach based on multiple relevance measures. CoCE first evaluates the quality of base co-clusters and consequently measures feature-to-object relevance. The latter, along with feature-to-feature and object-to-object relevance measures, contribute to the definition of a hybrid graph. The consensus process uses the resulting hybrid graph; it's formulated as a trace minimization problem and introduces a block-wise matrix multiplication technique to perform the optimization. Experimental results on various datasets show that CoCE not only frequently outperforms other related co-clustering ensembles, but also has reduced runtime cost and is more robust to poor base co-clusterings.

[1]  Mohamed Nadif,et al.  Word Co-Occurrence Regularized Non-Negative Matrix Tri-Factorization for Text Data Co-Clustering , 2018, AAAI.

[2]  Mohamed Nadif,et al.  Simultaneous Spectral Data Embedding and Clustering , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Shiping Wang,et al.  Penalized nonnegative matrix tri-factorization for co-clustering , 2017, Expert Syst. Appl..

[4]  Guoxian Yu,et al.  Network-aided Bi-Clustering for discovering cancer subtypes , 2017, Scientific Reports.

[5]  Fernando José Von Zuben,et al.  Enumerating all maximal biclusters in numerical datasets , 2014, Inf. Sci..

[6]  Mohamed Nadif,et al.  A Semi-NMF-PCA Unified Framework for Data Clustering , 2017, IEEE Transactions on Knowledge and Data Engineering.

[7]  Jesús S. Aguilar-Ruiz,et al.  Biclustering on expression data: A review , 2015, J. Biomed. Informatics.

[8]  Mohamed Nadif,et al.  A Unified Framework for Data Visualization and Coclustering , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Dingcheng Li,et al.  Spectral co-clustering ensemble , 2015, Knowl. Based Syst..

[10]  Eduardo R. Hruschka,et al.  Simultaneous co-clustering and learning to address the cold start problem in recommender systems , 2015, Knowl. Based Syst..

[11]  C. Wijmenga,et al.  Gene expression analysis identifies global gene dosage sensitivity in cancer , 2015, Nature Genetics.

[12]  Philip S. Yu,et al.  An Effective Approach on Overlapping Structures Discovery for Co-clustering , 2014, APWeb.

[13]  Peng Sun,et al.  Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering , 2014, Nucleic acids research.

[14]  H. Bock Probabilistic Two-way Clustering Approaches with Emphasis on the Maximum Interaction Criterion , 2014 .

[15]  Jian Ma,et al.  A network-assisted co-clustering algorithm to discover cancer subtypes based on gene expression , 2014, BMC Bioinformatics.

[16]  Gérard Govaert,et al.  Co-Clustering: Models, Algorithms and Applications , 2013 .

[17]  Neelima Gupta,et al.  BiETopti-BiClustering Ensemble Using Optimization Techniques , 2013, ICDM.

[18]  Daniel Boley,et al.  Constrained Spectral Clustering using L1 Regularization , 2013, SDM.

[19]  Mira Ayadi,et al.  Gene Expression Classification of Colon Cancer into Molecular Subtypes: Characterization, Validation, and Prognostic Value , 2013, PLoS medicine.

[20]  Mehmet Deveci,et al.  A comparative analysis of biclustering algorithms for gene expression data , 2013, Briefings Bioinform..

[21]  Andrea Tagarelli,et al.  Metacluster-based Projective Clustering Ensembles , 2013, Machine Learning.

[22]  Blaise Hanczar,et al.  Ensemble methods for biclustering tasks , 2012, Pattern Recognit..

[23]  Fan Jianping,et al.  Scalable ensemble information-theoretic co-clustering for massive data , 2012 .

[24]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[25]  Fillia Makedon,et al.  Fast Nonnegative Matrix Tri-Factorization for Large-Scale Data Co-Clustering , 2011, IJCAI.

[26]  Blaise Hanczar,et al.  Using the bagging approach for biclustering of gene expression data , 2011, Neurocomputing.

[27]  Kathryn B. Laskey,et al.  Nonparametric Bayesian Co-clustering Ensembles , 2011, SDM.

[28]  Philip S. Yu,et al.  Efficient Semi-supervised Spectral Co-clustering with Constraints , 2010, 2010 IEEE International Conference on Data Mining.

[29]  Andrea Tagarelli,et al.  Enhancing Single-Objective Projective Clustering Ensembles , 2010, 2010 IEEE International Conference on Data Mining.

[30]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[31]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Andrea Tagarelli,et al.  Projective clustering ensembles , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[33]  Ujjwal Maulik,et al.  A Novel Coherence Measure for Discovering Scaling Biclusters from Gene Expression Data , 2009, J. Bioinform. Comput. Biol..

[34]  Feiping Nie,et al.  Nonlinear Dimensionality Reduction with Local Spline Embedding , 2009, IEEE Transactions on Knowledge and Data Engineering.

[35]  Quanquan Gu,et al.  Co-clustering on manifolds , 2009, KDD.

[36]  Arindam Banerjee,et al.  Bayesian Co-clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[37]  Alexander Schliep,et al.  Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[38]  Padraig Cunningham,et al.  Unsupervised retrieval of attack profiles in collaborative recommender systems , 2008, RecSys '08.

[39]  Li Teng,et al.  Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data , 2008, J. Signal Process. Syst..

[40]  Gérard Govaert,et al.  Block clustering with Bernoulli mixture models: Comparison of different approaches , 2008, Comput. Stat. Data Anal..

[41]  Jill P. Mesirov,et al.  Subclass Mapping: Identifying Common Subtypes in Independent Disease Data Sets , 2007, PloS one.

[42]  J. Mesirov,et al.  Metagene projection for cross-platform, cross-species characterization of global transcriptional states , 2007, Proceedings of the National Academy of Sciences.

[43]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[44]  Dimitrios Gunopulos,et al.  Locally adaptive metrics for clustering high dimensional data , 2007, Data Mining and Knowledge Discovery.

[45]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[46]  William-Chandra Tjhi,et al.  Minimum sum-squared residue for fuzzy co-clustering , 2006, Intell. Data Anal..

[47]  Patrik D'haeseleer,et al.  How does gene expression clustering work? , 2005, Nature Biotechnology.

[48]  Srujana Merugu,et al.  A scalable collaborative filtering framework based on co-clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[49]  Tie-Yan Liu,et al.  Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering , 2005, KDD '05.

[50]  Gérard Govaert,et al.  An EM algorithm for the block mixture model , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[52]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[53]  Inderjit S. Dhillon,et al.  Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data , 2004, SDM.

[54]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[55]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[56]  Gérard Govaert,et al.  Clustering with block mixture models , 2003, Pattern Recognit..

[57]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[58]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[59]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[60]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[61]  A. Brazma,et al.  Gene expression data analysis. , 2001, FEBS letters.

[62]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[63]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[64]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[65]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[66]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[67]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[68]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[69]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.