TWCC: Automated Two-way Subspace Weighting Partitional Co-Clustering

Abstract A two-way subspace weighting partitional co-clustering method TWCC is proposed. In this method, two types of subspace weights are introduced to simultaneously weight the data in two ways, i.e., columns on row clusters and rows on column clusters. An objective function that uses the two types of weights in the distance function to determine the co-clusters of data is defined, and an iterative TWCC co-clustering algorithm to optimize the objective function is proposed, in which the two types of subspace weights are automatically computed. A series of experiments on both synthetic and real-life data were conducted to investigate the properties of TWCC, compare the two-way clustering results of TWCC with those of eight co-clustering algorithms, and compare one-way clustering results of TWCC with those of six clustering algorithms. The results have shown that TWCC is robust and effective for large high-dimensional data.

[1]  Jian Ma,et al.  A network-assisted co-clustering algorithm to discover cancer subtypes based on gene expression , 2014, BMC Bioinformatics.

[2]  Feiping Nie,et al.  Scalable Normalized Cut with Improved Spectral Rotation , 2017, IJCAI.

[3]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[4]  Gérard Govaert,et al.  An EM algorithm for the block mixture model , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yunming Ye,et al.  TW-k-means: Automated two-level variable weighting clustering algorithm for multiview data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[7]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[8]  Xiaojun Chen,et al.  Subspace Weighting Co-Clustering of Gene Expression Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Chun Chen,et al.  Locally Discriminative Coclustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[10]  Yu-Jin Zhang,et al.  Nonnegative Matrix Factorization: A Comprehensive Review , 2013, IEEE Transactions on Knowledge and Data Engineering.

[11]  Eduardo R. Hruschka,et al.  Simultaneous co-clustering and learning to address the cold start problem in recommender systems , 2015, Knowl. Based Syst..

[12]  Hasan Davulcu,et al.  Story Forms Detection in Text through Concept-Based Co-Clustering , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).

[13]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[14]  Inderjit S. Dhillon,et al.  Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data , 2004, SDM.

[15]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[16]  Chun-Hung Su,et al.  A modified fuzzy co-clustering (MFCC) approach for microarray data analysis , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[17]  Tao Wu,et al.  General Tensor Spectral Co-clustering for Higher-Order Data , 2016, NIPS.

[18]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[19]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[20]  Tao Li,et al.  Hierarchical Co-Clustering: A New Way to Organize the Music Data , 2012, IEEE Transactions on Multimedia.

[21]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[22]  Gérard Govaert,et al.  Clustering with block mixture models , 2003, Pattern Recognit..

[23]  Yunming Ye,et al.  Feature Weighting Information-Theoretic Co-Clustering for Document Clustering , 2009, 2009 2nd International Conference on Computer Science and its Applications.

[24]  Inderjit S. Dhillon,et al.  A generalized maximum entropy approach to bregman co-clustering and matrix approximation , 2004, J. Mach. Learn. Res..

[25]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[26]  Yunming Ye,et al.  A feature group weighting method for subspace clustering of high-dimensional data , 2012, Pattern Recognit..

[27]  D. Duffy,et al.  A permutation-based algorithm for block clustering , 1991 .

[28]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[29]  Syed Fawad Hussain,et al.  Biclustering of human cancer microarray data using co-similarity based co-clustering , 2016, Expert Syst. Appl..

[30]  Mohamed Nadif,et al.  Co-clustering , 2013, Encyclopedia of Database Systems.

[31]  Ramya Elizabeth Thomas,et al.  Co-Clustering with Side Information for Text mining , 2016, 2016 International Conference on Data Mining and Advanced Computing (SAPIENCE).

[32]  R. Tryon Cluster Analysis , 1939 .

[33]  Boris Mirkin,et al.  Mathematical Classification and Clustering: From How to What and Why , 1998 .

[34]  L. Hubert,et al.  Additive two-mode clustering: The error-variance approach revisited , 1995 .

[35]  William-Chandra Tjhi,et al.  Flexible Fuzzy Co-clustering with Feature-cluster Weighting , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.

[36]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Mustapha Lebbah,et al.  Feature Group Weighting and Topological Biclustering , 2014, ICONIP.