Approximating min-sum k-clustering in metric spaces

The min-sum k-clustering problem in a metric space is to find a partition of the space into k clusters as to minimize the total sum of distances between pairs of points assigned to the same cluster. We give the first polynomial time non-trivial approximation algorithm for this problem. The algorithm provides an $\ratio$ approximation to the min-sum k-clustering problem in general metric spaces, with running time $\runtime$. The result is based on embedding of metric spaces into hierarchically separated trees. We also provide a bicriteria approximation result that provides a constant approximation factor solution with only a constant factor increase in the number of clusters. This result is obtained by modifying and drawing ideas from recently developed primal dual approximation algorithms for facility location.

[1]  Refael Hassin,et al.  Approximation Algorithms for Min-sum p-clustering , 1998, Discret. Appl. Math..

[2]  Leonard J. Schulman,et al.  Clustering for Edge-Cost Minimization , 1999, Electron. Colloquium Comput. Complex..

[3]  Vijay V. Vazirani,et al.  Primal-dual approximation algorithms for metric facility location and k-median problems , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[4]  Piotr Indyk A sublinear time approximation scheme for clustering in metric spaces , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[5]  Sudipto Guha,et al.  Approximating a finite metric by a small number of tree metrics , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[6]  Sanjeev Khanna,et al.  On the Hardness of Approximating Max k-Cut and its Dual , 1997, Chic. J. Theor. Comput. Sci..

[7]  Teofilo F. Gonzalez,et al.  P-Complete Approximation Problems , 1976, J. ACM.

[8]  Yair Bartal,et al.  Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[9]  R. Ravi,et al.  Bicriteria Network Design Problems , 1998, J. Algorithms.

[10]  Leonard J. Schulman,et al.  Clustering for edge-cost minimization (extended abstract) , 2000, STOC '00.

[11]  Yair Bartal,et al.  On approximating arbitrary metrices by tree metrics , 1998, STOC '98.

[12]  Satish Rao,et al.  A tight bound on approximating arbitrary metrics by tree metrics , 2003, STOC '03.