LP-Based Pivoting Algorithm for Higher-Order Correlation Clustering

Correlation clustering is an approach for clustering a set of objects from given pairwise information. In this approach, the given pairwise information is usually represented by an undirected graph with nodes corresponding to the objects, where each edge in the graph is assigned a nonnegative weight, and either the positive or negative label. Then, a clustering is obtained by solving an optimization problem of finding a partition of the node set that minimizes the disagreement or maximizes the agreement with the pairwise information. In this paper, we extend correlation clustering with disagreement minimization to deal with higher-order relationships represented by hypergraphs. We give two pivoting algorithms based on a linear programming relaxation of the problem. One achieves an \(O(k \log n)\)-approximation, where n is the number of nodes and k is the maximum size of hyperedges with the negative labels. This algorithm can be applied to any hyperedges with arbitrary weights. The other is an O(r)-approximation for complete r-partite hypergraphs with uniform weights. This type of hypergraphs arise from the coclustering setting of correlation clustering.

[1]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[2]  Andrew McCallum,et al.  Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference , 2003, IIWeb.

[3]  Wei Peng,et al.  Temporal relation co-clustering on directional social network and author-topic evolution , 2011, Knowledge and Information Systems.

[4]  Tselil Schramm,et al.  Near Optimal LP Rounding Algorithm for CorrelationClustering on Complete and Complete k-partite Graphs , 2014, STOC.

[5]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.

[6]  Steven Skiena,et al.  Integrating microarray data by consensus clustering , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[7]  Nir Ailon,et al.  Improved Approximation Algorithms for Bipartite Correlation Clustering , 2012, SIAM J. Comput..

[8]  Yada Zhu,et al.  Co-Clustering based Dual Prediction for Cargo Pricing Optimization , 2015, KDD.

[9]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[10]  William W. Cohen,et al.  Learning to match and cluster large high-dimensional data sets for data integration , 2002, KDD.

[11]  Beverly Sackler,et al.  The Bicluster Graph Editing Problem , 2004 .

[12]  Xinlei Chen,et al.  Sense discovery via co-clustering on images and text , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[14]  Gerhard Reinelt,et al.  Higher-order segmentation via multicuts , 2013, Comput. Vis. Image Underst..

[15]  Arlindo L. Oliveira,et al.  Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Sebastian Nowozin,et al.  Image Segmentation UsingHigher-Order Correlation Clustering , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Arindam Banerjee,et al.  Bayesian Co-clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[18]  Gilles Bisson,et al.  An Improved Co-Similarity Measure for Document Clustering , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[19]  Yuval Rabani,et al.  ON THE HARDNESS OF APPROXIMATING MULTICUT AND SPARSEST-CUT , 2005, 20th Annual IEEE Conference on Computational Complexity (CCC'05).

[20]  Sebastian Nowozin,et al.  Higher-Order Correlation Clustering for Image Segmentation , 2011, NIPS.

[21]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[22]  Nikos D. Sidiropoulos,et al.  From K-Means to Higher-Way Co-Clustering: Multilinear Decomposition With Sparse Latent Factors , 2013, IEEE Transactions on Signal Processing.

[23]  Ken-ichi Kawarabayashi,et al.  Scalable Algorithm for Higher-Order Co-Clustering via Random Sampling , 2017, AAAI.

[24]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[25]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[26]  Amos Fiat,et al.  Correlation clustering in general weighted graphs , 2006, Theor. Comput. Sci..

[27]  Gilles Bisson,et al.  Chi-Sim: A New Similarity Measure for the Co-clustering Task , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[28]  Chris H. Q. Ding,et al.  Bipartite graph partitioning and data clustering , 2001, CIKM '01.

[29]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .