Scalable Algorithm for Higher-Order Co-Clustering via Random Sampling

We propose a scalable and efficient algorithm for coclustering a higher-order tensor. Viewing tensors with hypergraphs, we propose formulating the co-clustering of a tensor as a problem of partitioning the corresponding hypergraph. Our algorithm is based on the random sampling technique, which has been successfully applied to graph cut problems. We extend a random sampling algorithm for the graph multiway cut problem to hypergraphs, and design a co-clustering algorithm based on it. Each iteration of our algorithm runs in polynomial on the size of hypergraphs, and thus it performs well even for higher-order tensors, which are difficult to deal with for state-of-the-art algorithm.

[1]  Soheil Feizi,et al.  Biclustering Usinig Message Passing , 2014, NIPS.

[2]  David R. Karger,et al.  A new approach to the minimum cut problem , 1996, JACM.

[3]  Yada Zhu,et al.  Co-Clustering based Dual Prediction for Cargo Pricing Optimization , 2015, KDD.

[4]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[5]  Arlindo L. Oliveira,et al.  Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Syed Fawad Hussain Bi-clustering Gene Expression Data Using Co-similarity , 2011, ADMA.

[7]  Chris H. Q. Ding,et al.  Bipartite graph partitioning and data clustering , 2001, CIKM '01.

[8]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[9]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[10]  Gilles Bisson,et al.  Chi-Sim: A New Similarity Measure for the Co-clustering Task , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[11]  Martin Ester,et al.  Inferring cancer subnetwork markers using density-constrained biclustering , 2010, Bioinform..

[12]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[13]  Takuro Fukunaga,et al.  Computing minimum multiway cuts in hypergraphs , 2013, Discret. Optim..

[14]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[15]  Yu Zong,et al.  Web Co-clustering of Usage Network Using Tensor Decomposition , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[16]  Arindam Banerjee,et al.  Bayesian Co-clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[17]  Gilles Bisson,et al.  An Improved Co-Similarity Measure for Document Clustering , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[18]  John Nerbonne,et al.  Bipartite spectral graph partitioning for clustering dialect varieties and detecting their linguistic features , 2011, Comput. Speech Lang..

[19]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[20]  Nikos D. Sidiropoulos,et al.  From K-Means to Higher-Way Co-Clustering: Multilinear Decomposition With Sparse Latent Factors , 2013, IEEE Transactions on Signal Processing.

[21]  Springer-Verlag London Limited Temporal relation co-clustering on directional social network and author-topic evolution , 2010 .

[22]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.

[23]  David R. Karger,et al.  Minimum cuts in near-linear time , 1998, JACM.

[24]  Mikkel Thorup,et al.  Minimum k-way cuts via deterministic greedy tree packing , 2008, STOC.

[25]  Xinlei Chen,et al.  Sense discovery via co-clustering on images and text , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  David R. Karger,et al.  Global min-cuts in RNC, and other ramifications of a simple min-out algorithm , 1993, SODA '93.

[27]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.