Correlation Clustering with Noisy Partial Information

In this paper, we propose and study a semi-random model for the Correlation Clustering problem on arbitrary graphs G. We give two approximation algorithms for Correlation Clustering instances from this model. The first algorithm finds a solution of value $(1+ \delta) optcost + O_{\delta}(n\log^3 n)$ with high probability, where $optcost$ is the value of the optimal solution (for every $\delta > 0$). The second algorithm finds the ground truth clustering with an arbitrarily small classification error $\eta$ (under some additional assumptions on the instance).

[1]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[2]  Joel H. Spencer,et al.  Coloring Random and Semi-Random k-Colorable Graphs , 1995, J. Algorithms.

[3]  Mihalis Yannakakis,et al.  Approximate Max-Flow Min-(Multi)Cut Theorems and Their Applications , 1996, SIAM J. Comput..

[4]  J. Kilian,et al.  Heuristics for finding large independent sets, with applications to coloring semi-random graphs , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[5]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[6]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[7]  U. Feige,et al.  Finding and certifying a large hidden clique in a semirandom graph , 2000, Random Struct. Algorithms.

[8]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[10]  Ben Taskar,et al.  Link Prediction in Relational Data , 2003, NIPS.

[11]  Chaitanya Swamy,et al.  Correlation Clustering: maximizing agreements via semidefinite programming , 2004, SODA '04.

[12]  Nisheeth K. Vishnoi,et al.  The Unique Games Conjecture, Integrality Gap for Cut Problems and Embeddability of Negative Type Metrics into l1 , 2005, FOCS.

[13]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[14]  Duncan J. Watts,et al.  The Structure and Dynamics of Networks: (Princeton Studies in Complexity) , 2006 .

[15]  Amos Fiat,et al.  Correlation clustering in general weighted graphs , 2006, Theor. Comput. Sci..

[16]  Mark E. J. Newman,et al.  Structure and Dynamics of Networks , 2009 .

[17]  Nir Ailon,et al.  Aggregating inconsistent information: Ranking and clustering , 2008 .

[18]  S. Butler Eigenvalues and structures of graphs , 2008 .

[19]  M. Elsner,et al.  Bounding and Comparing Methods for Correlation Clustering Beyond ILP , 2009, ILP 2009.

[20]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[21]  Mark Braverman,et al.  Finding Low Error Clusterings , 2009, COLT.

[22]  Kim-Chuan Toh,et al.  A Newton-CG Augmented Lagrangian Method for Semidefinite Programming , 2010, SIAM J. Optim..

[23]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[24]  Claire Mathieu,et al.  Correlation clustering with noisy input , 2010, SODA '10.

[25]  Alexandra Kolla,et al.  How to Play Unique Games Against a Semi-random Adversary: Study of Semi-random Models of Unique Games , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[26]  David P. Williamson,et al.  The Design of Approximation Algorithms , 2011 .

[27]  Yudong Chen,et al.  Clustering Partially Observed Graphs via Convex Optimization , 2011, ICML.

[28]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[30]  Noah A. Smith Linguistic Structure Prediction , 2011, Synthesis Lectures on Human Language Technologies.

[31]  Julian Yarkony,et al.  Fast Planar Correlation Clustering for Image Segmentation , 2012, ECCV.

[32]  Aravindan Vijayaraghavan,et al.  Approximation algorithms for semi-random partitioning problems , 2012, STOC '12.

[33]  Sujay Sanghavi,et al.  Clustering Sparse Graphs , 2012, NIPS.

[34]  Aravindan Vijayaraghavan,et al.  Sorting noisy data with partial information , 2013, ITCS '13.

[35]  Aravindan Vijayaraghavan,et al.  Constant factor approximation for balanced cut in the PIE model , 2014, STOC.

[36]  Tim Roughgarden,et al.  Tight Error Bounds for Structured Prediction , 2014, ArXiv.

[37]  Tselil Schramm,et al.  Near Optimal LP Rounding Algorithm for CorrelationClustering on Complete and Complete k-partite Graphs , 2014, STOC.

[38]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .