Ensemble Block Co-clustering: A Unified Framework for Text Data

In this paper, we propose a unified framework for Ensemble Block Co-clustering (EBCO), which aims to fuse multiple basic co-clusterings into a consensus structured affinity matrix. Each co-clustering to be fused is obtained by applying a co-clustering method on the same document-term dataset. This fusion process reinforces the individual quality of the multiple basic data co-clusterings within a single consensus matrix. Besides, the proposed framework enables a completely unsupervised co-clustering where the number of co-clusters is automatically inferred based on the non trivial generalized modularity. We first define an explicit objective function which allows the joint learning of the basic co-clusterings aggregation and the consensus block co-clustering. Then, we show that EBCO generalizes the one side ensemble clustering to an ensemble block co-clustering context. We also establish theoretical equivalence to spectral co-clustering and weighted double spherical k-means clustering for textual data. Experimental results on various real-world document-term datasets demonstrate that EBCO is an efficient competitor to some state-of-the-art ensemble and co-clustering methods.

[1]  Sandro Vega-Pons,et al.  Weighted partition consensus via kernels , 2010, Pattern Recognit..

[2]  Mohamed Nadif,et al.  Model-based von Mises-Fisher Co-clustering with a Conscience , 2017, SDM.

[3]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Blaise Hanczar,et al.  Ensemble methods for biclustering tasks , 2012, Pattern Recognit..

[5]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[6]  Chris H. Q. Ding,et al.  Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[7]  Junjie Wu,et al.  Spectral Ensemble Clustering , 2015, KDD.

[8]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[9]  Fillia Makedon,et al.  Fast Nonnegative Matrix Tri-Factorization for Large-Scale Data Co-Clustering , 2011, IJCAI.

[10]  Carlotta Domeniconi,et al.  Weighted Clustering Ensembles , 2006, SDM.

[11]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[12]  Mohamed Nadif,et al.  Co-clustering Document-term Matrices by Direct Maximization of Graph Modularity , 2015, CIKM.

[13]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Mohamed Nadif,et al.  Directional co-clustering , 2019, Adv. Data Anal. Classif..

[15]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[16]  Fei Wang,et al.  Generalized Cluster Aggregation , 2009, IJCAI.

[17]  Gérard Govaert,et al.  Co-Clustering: Models, Algorithms and Applications , 2013 .

[18]  Michael Röder,et al.  Exploring the Space of Topic Coherence Measures , 2015, WSDM.

[19]  Gérard Govaert,et al.  Mutual information, phi-squared and model-based co-clustering for contingency tables , 2016, Advances in Data Analysis and Classification.

[20]  Hans-Hermann Bock,et al.  Two-mode clustering methods: astructuredoverview , 2004, Statistical methods in medical research.

[21]  Blaise Hanczar,et al.  Using the bagging approach for biclustering of gene expression data , 2011, Neurocomputing.

[22]  K JainAnil,et al.  Combining Multiple Clusterings Using Evidence Accumulation , 2005 .

[23]  Chris H. Q. Ding,et al.  Weighted Consensus Clustering , 2008, SDM.

[24]  Jun Wang,et al.  Co-Clustering Ensembles Based on Multiple Relevance Measures , 2021, IEEE Transactions on Knowledge and Data Engineering.

[25]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[26]  Katia P. Sycara,et al.  Nonnegative Matrix Tri-Factorization with Graph Regularization for Community Detection in Social Networks , 2015, IJCAI.

[27]  Yun Fu,et al.  Robust Spectral Ensemble Clustering , 2016, CIKM.

[28]  Mohamed Nadif,et al.  CoClust: A Python Package for Co-Clustering , 2019, Journal of Statistical Software.

[29]  Feiping Nie,et al.  Nonnegative Matrix Tri-factorization Based High-Order Co-clustering and Its Fast Implementation , 2011, 2011 IEEE 11th International Conference on Data Mining.

[30]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[31]  Mohamed Nadif,et al.  Co-clustering for Binary and Categorical Data with Maximum Modularity , 2011, 2011 IEEE 11th International Conference on Data Mining.

[32]  Dingcheng Li,et al.  Spectral co-clustering ensemble , 2015, Knowl. Based Syst..

[33]  Mohamed Nadif,et al.  Graph modularity maximization as an effective method for co-clustering text data , 2016, Knowl. Based Syst..

[34]  Jing Hua,et al.  Exemplar-based Visualization of Large Document Corpus (InfoVis2009-1115) , 2009, IEEE Transactions on Visualization and Computer Graphics.

[35]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[36]  Sang-Ho Lee,et al.  Heterogeneous Clustering Ensemble Method for Combining Different Cluster Results , 2006, BioDM.

[37]  F. Marcotorchino,et al.  Block seriation problems: A unified approach. Reply to the problem of H. Garcia and J. M. Proth (Applied Stochastic Models and Data Analysis, 1, (1), 25–34 (1985)) , 1987 .

[38]  M. Cugmas,et al.  On comparing partitions , 2015 .

[39]  Duane DeSieno,et al.  Adding a conscience to competitive learning , 1988, IEEE 1988 International Conference on Neural Networks.