论文信息 - Co-Clustering Ensembles Based on Multiple Relevance Measures

Co-Clustering Ensembles Based on Multiple Relevance Measures

Co-clustering aims at discovering groups of both objects and features from a given data matrix. Co-clustering ensembles can produce robust co-clusters by combining multiple base co-clusterings. However, current co-clustering ensemble solutions either ignore the constraints resulting from feature-to-feature and object-to-object relevance information, or ignore feature-to-object relevance information. In this paper, we advocate that all three information sources contribute to the achievement of good consensus solutions, and propose a co-clustering ensemble (CoCE) approach based on multiple relevance measures. CoCE first evaluates the quality of base co-clusters and consequently measures feature-to-object relevance. The latter, along with feature-to-feature and object-to-object relevance measures, contribute to the definition of a hybrid graph. The consensus process uses the resulting hybrid graph; it's formulated as a trace minimization problem and introduces a block-wise matrix multiplication technique to perform the optimization. Experimental results on various datasets show that CoCE not only frequently outperforms other related co-clustering ensembles, but also has reduced runtime cost and is more robust to poor base co-clusterings.

[1] Mohamed Nadif,et al. Word Co-Occurrence Regularized Non-Negative Matrix Tri-Factorization for Text Data Co-Clustering , 2018, AAAI.

[2] Mohamed Nadif,et al. Simultaneous Spectral Data Embedding and Clustering , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[3] Shiping Wang,et al. Penalized nonnegative matrix tri-factorization for co-clustering , 2017, Expert Syst. Appl..

[4] Guoxian Yu,et al. Network-aided Bi-Clustering for discovering cancer subtypes , 2017, Scientific Reports.

[5] Fernando José Von Zuben,et al. Enumerating all maximal biclusters in numerical datasets , 2014, Inf. Sci..

[6] Mohamed Nadif,et al. A Semi-NMF-PCA Unified Framework for Data Clustering , 2017, IEEE Transactions on Knowledge and Data Engineering.

[7] Jesús S. Aguilar-Ruiz,et al. Biclustering on expression data: A review , 2015, J. Biomed. Informatics.

[8] Mohamed Nadif,et al. A Unified Framework for Data Visualization and Coclustering , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[9] Dingcheng Li,et al. Spectral co-clustering ensemble , 2015, Knowl. Based Syst..

[10] Eduardo R. Hruschka,et al. Simultaneous co-clustering and learning to address the cold start problem in recommender systems , 2015, Knowl. Based Syst..

[11] C. Wijmenga,et al. Gene expression analysis identifies global gene dosage sensitivity in cancer , 2015, Nature Genetics.

[12] Philip S. Yu,et al. An Effective Approach on Overlapping Structures Discovery for Co-clustering , 2014, APWeb.

[13] Peng Sun,et al. Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering , 2014, Nucleic acids research.

[14] H. Bock. Probabilistic Two-way Clustering Approaches with Emphasis on the Maximum Interaction Criterion , 2014 .

[15] Jian Ma,et al. A network-assisted co-clustering algorithm to discover cancer subtypes based on gene expression , 2014, BMC Bioinformatics.

[16] Gérard Govaert,et al. Co-Clustering: Models, Algorithms and Applications , 2013 .

[17] Neelima Gupta,et al. BiETopti-BiClustering Ensemble Using Optimization Techniques , 2013, ICDM.

[18] Daniel Boley,et al. Constrained Spectral Clustering using L1 Regularization , 2013, SDM.

[19] Mira Ayadi,et al. Gene Expression Classification of Colon Cancer into Molecular Subtypes: Characterization, Validation, and Prognostic Value , 2013, PLoS medicine.

[20] Mehmet Deveci,et al. A comparative analysis of biclustering algorithms for gene expression data , 2013, Briefings Bioinform..

[21] Andrea Tagarelli,et al. Metacluster-based Projective Clustering Ensembles , 2013, Machine Learning.

[22] Blaise Hanczar,et al. Ensemble methods for biclustering tasks , 2012, Pattern Recognit..

[23] Fan Jianping,et al. Scalable ensemble information-theoretic co-clustering for massive data , 2012 .

[24] Sandro Vega-Pons,et al. A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[25] Fillia Makedon,et al. Fast Nonnegative Matrix Tri-Factorization for Large-Scale Data Co-Clustering , 2011, IJCAI.

[26] Blaise Hanczar,et al. Using the bagging approach for biclustering of gene expression data , 2011, Neurocomputing.

[27] Kathryn B. Laskey,et al. Nonparametric Bayesian Co-clustering Ensembles , 2011, SDM.

[28] Philip S. Yu,et al. Efficient Semi-supervised Spectral Co-clustering with Constraints , 2010, 2010 IEEE International Conference on Data Mining.

[29] Andrea Tagarelli,et al. Enhancing Single-Objective Projective Clustering Ensembles , 2010, 2010 IEEE International Conference on Data Mining.

[30] Anil K. Jain. Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[31] Chris H. Q. Ding,et al. Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Andrea Tagarelli,et al. Projective clustering ensembles , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[33] Ujjwal Maulik,et al. A Novel Coherence Measure for Discovering Scaling Biclusters from Gene Expression Data , 2009, J. Bioinform. Comput. Biol..

[34] Feiping Nie,et al. Nonlinear Dimensionality Reduction with Local Spline Embedding , 2009, IEEE Transactions on Knowledge and Data Engineering.

[35] Quanquan Gu,et al. Co-clustering on manifolds , 2009, KDD.

[36] Arindam Banerjee,et al. Bayesian Co-clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[37] Alexander Schliep,et al. Clustering cancer gene expression data: a comparative study , 2008, BMC Bioinformatics.

[38] Padraig Cunningham,et al. Unsupervised retrieval of attack profiles in collaborative recommender systems , 2008, RecSys '08.

[39] Li Teng,et al. Discovering Biclusters by Iteratively Sorting with Weighted Correlation Coefficient in Gene Expression Data , 2008, J. Signal Process. Syst..

[40] Gérard Govaert,et al. Block clustering with Bernoulli mixture models: Comparison of different approaches , 2008, Comput. Stat. Data Anal..

[41] Jill P. Mesirov,et al. Subclass Mapping: Identifying Common Subtypes in Independent Disease Data Sets , 2007, PloS one.

[42] J. Mesirov,et al. Metagene projection for cross-platform, cross-species characterization of global transcriptional states , 2007, Proceedings of the National Academy of Sciences.

[43] Gene H. Golub,et al. Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[44] Dimitrios Gunopulos,et al. Locally adaptive metrics for clustering high dimensional data , 2007, Data Mining and Knowledge Discovery.

[45] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[46] William-Chandra Tjhi,et al. Minimum sum-squared residue for fuzzy co-clustering , 2006, Intell. Data Anal..

[47] Patrik D'haeseleer,et al. How does gene expression clustering work? , 2005, Nature Biotechnology.

[48] Srujana Merugu,et al. A scalable collaborative filtering framework based on co-clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[49] Tie-Yan Liu,et al. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering , 2005, KDD '05.

[50] Gérard Govaert,et al. An EM algorithm for the block mixture model , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51] Jill P. Mesirov,et al. Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[52] Arlindo L. Oliveira,et al. Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[53] Inderjit S. Dhillon,et al. Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data , 2004, SDM.

[54] Inderjit S. Dhillon,et al. Information-theoretic co-clustering , 2003, KDD '03.

[55] Joseph T. Chang,et al. Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[56] Gérard Govaert,et al. Clustering with block mixture models , 2003, Pattern Recognit..

[57] Joydeep Ghosh,et al. Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[58] Richard M. Karp,et al. Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[59] Inderjit S. Dhillon,et al. Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[60] Claire Cardie,et al. Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[61] A. Brazma,et al. Gene expression data analysis. , 2001, FEBS letters.

[62] George M. Church,et al. Biclustering of Expression Data , 2000, ISMB.

[63] Christian A. Rees,et al. Molecular portraits of human breast tumours , 2000, Nature.

[64] Thomas G. Dietterich. Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[65] Esa Alhoniemi,et al. Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[66] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[67] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[68] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[69] S. C. Johnson. Hierarchical clustering schemes , 1967, Psychometrika.