A Survey of Correlation Clustering

The problem of partitioning a set of data points into clusters is found in many applications. Correlation clustering is a clustering technique motivated by the the problem of document clustering, in which given a large corpus of documents such as web pages, we wish to find their optimal partition into clusters. While most commonly used clustering algorithms such as k-means, k-clustering sum and k-center require prior knowledge of the number of clusters that we wish to divide the data into, for the case of classifying web documents, finding the number of clusters is not a trivial task. Correlation Clustering, introduced by Bansal, Blum and Chawla [1], provides a method for clustering a set of objects into the optimal number of clusters, without specifying that number in advance. In this paper we present two different approximation algorithms for the Correlation Clustering problem. We then discuss some open problems and give our intuition as to how to approach them.

[1]  Chaitanya Swamy,et al.  Correlation Clustering: maximizing agreements via semidefinite programming , 2004, SODA '04.

[2]  Amos Fiat,et al.  Correlation clustering in general weighted graphs , 2006, Theor. Comput. Sci..

[3]  David B. Shmoys,et al.  A unified approach to approximation algorithms for bottleneck problems , 1986, JACM.

[4]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[5]  Nikhil Bansal,et al.  Correlation Clustering , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[6]  Hans-Peter Kriegel,et al.  Clustering for Mining in Large Spatial Databases , 1998, Künstliche Intell..

[7]  Moses Charikar,et al.  Maximizing quadratic programs: extending Grothendieck's inequality , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[8]  Nicole Immorlica,et al.  Approximation, Randomization, and Combinatorial Optimization.. Algorithms and Techniques , 2003, Lecture Notes in Computer Science.

[9]  Amos Fiat,et al.  Correlation Clustering - Minimizing Disagreements on Arbitrary Weighted Graphs , 2003, ESA.