Learning A Structured Optimal Bipartite Graph for Co-Clustering

Co-clustering methods have been widely applied to document clustering and gene expression analysis. These methods make use of the duality between features and samples such that the co-occurring structure of sample and feature clusters can be extracted. In graph based co-clustering methods, a bipartite graph is constructed to depict the relation between features and samples. Most existing co-clustering methods conduct clustering on the graph achieved from the original data matrix, which doesn’t have explicit cluster structure, thus they require a post-processing step to obtain the clustering results. In this paper, we propose a novel co-clustering method to learn a bipartite graph with exactly k connected components, where k is the number of clusters. The new bipartite graph learned in our model approximates the original graph but maintains an explicit cluster structure, from which we can immediately get the clustering results without post-processing. Extensive empirical results are presented to verify the effectiveness and robustness of our model.

[1]  Xiaohui Cui,et al.  Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm , 2005 .

[2]  M. Datta,et al.  A simple computer program for calculating PSA recurrence in prostate cancer patients , 2004, BMC urology.

[3]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[4]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations: II. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[5]  B. Mohar THE LAPLACIAN SPECTRUM OF GRAPHS y , 1991 .

[6]  A. Osbourn,et al.  Gene clustering in plant specialized metabolism. , 2014, Current opinion in biotechnology.

[7]  Feiping Nie,et al.  A New Simplex Sparse Learning Model to Measure Data Similarity for Clustering , 2015, IJCAI.

[8]  F. Piano,et al.  Gene Clustering Based on RNAi Phenotypes of Ovary-Enriched Genes in C. elegans , 2002, Current Biology.

[9]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[10]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Feiping Nie,et al.  Clustering and projected clustering with adaptive neighbors , 2014, KDD.

[12]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[13]  E. Petricoin,et al.  Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[14]  Feiping Nie,et al.  The Constrained Laplacian Rank Algorithm for Graph-Based Clustering , 2016, AAAI.

[15]  Maoguo Gong,et al.  Fuzzy C-Means Clustering With Local Information and Kernel Metric for Image Segmentation , 2013, IEEE Transactions on Image Processing.

[16]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Michael William Newman,et al.  The Laplacian spectrum of graphs , 2001 .

[18]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[19]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[20]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations I. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Michael W. Berry,et al.  Document clustering using nonnegative matrix factorization , 2006, Inf. Process. Manag..