A constant-factor approximation algorithm for the k-median problem (extended abstract)

1 Introduction We present the first constant-factor approximation algorithm for the metric k-median problem. The k-median problem is one of the most well-studied clustering problems , i.e., those problems in which the aim is to partition a given set of points into clusters so that the points within a cluster are relatively close with respect to some measure. For the metric k-median problem, we are given n points in a metric space. We select k of these to be cluster centers, and then assign each point to its closest selected center. If point j is assigned to a center i, the cost incurred is proportional to the distance between i and j. The goal is to select the k centers that minimize the sum of the assignment costs. We give a 6$-approximation algorithm for this problem. This improves upon the best previously known result of O(log kloglog k), which was obtained by refining and derandomizing a randomized CJ(log n log log n)-approximation algorithm of Bartal. For the metric k-median problem, we are given n points in a metric space. We must select k of these to be cluster centers, and then assign each input point j to the selected center that is closest to it. If location j is assigned to a center i, we incur a cost proportional to the distance between i and j. The goal is to select the k centers so as to minimize the sum of the assignment costs. We give a 6$-approximation algorithm for this problem, that is, a polynomial-time algorithm that finds a feasible solution of objective function value within a factor of 6; of the optimum. We also give constant factor approximation algorithms for several natural extensions of the problem. Lin & Vitter [18] considered the k-median problem with arbitrary assignment costs, and gave a polynomial-time algorithm that finds, for any c > 0, a solution for which the objective function value is within a factor of 1 + e of the optimum, but is infeasible: it opens (1 + l/c)(ln n + l)k cluster centers. Lin & Vitter also provided evidence that this result is best possible via a reduction from the set cover problem. Consequently, it is quite natural to consider special cases. The problem is solvable in polynomial time on trees [14, 221. However, for general metric spaces, the problem is NP-hard to solve exactly. Arora, Raghavan & R.ao …

[1]  George L. Nemhauser,et al.  The uncapacitated facility location problem , 1990 .

[2]  Éva Tardos,et al.  An approximation algorithm for the generalized assignment problem , 1993, Math. Program..

[3]  Arie Tamir,et al.  An O(pn2) algorithm for the p-median and related problems on tree graphs , 1996, Oper. Res. Lett..

[4]  Éva Tardos,et al.  Approximation algorithms for facility location problems (extended abstract) , 1997, STOC '97.

[5]  Samir Khuller,et al.  Greedy strikes back: improved facility location algorithms , 1998, SODA '98.

[6]  An A Fabii,et al.  Improved Approximation Algorithms for Uncapacitated Facility Location , 1998 .

[7]  Samir Khuller,et al.  The Capacitated K-Center Problem , 2000, SIAM J. Discret. Math..

[8]  J. Vitter,et al.  Approximations with Minimum Packing Constraint Violation , 1992 .

[9]  Yair Bartal,et al.  On approximating arbitrary metrices by tree metrics , 1998, STOC '98.

[10]  Yair Bartal,et al.  Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[11]  Sudipto Guha,et al.  Rounding via Trees : Deterministic Approximation Algorithms forGroup , 1998 .

[12]  A. Frieze,et al.  A simple heuristic for the p-centre problem , 1985 .

[13]  D. Hochbaum,et al.  A best possible approximation algorithm for the k--center problem , 1985 .

[14]  Rajmohan Rajaraman,et al.  Analysis of a local search heuristic for facility location problems , 2000, SODA '98.

[15]  Jeffrey Scott Vitter,et al.  e-approximations with minimum packing constraint violation (extended abstract) , 1992, STOC '92.

[16]  Judit Bar-Ilan,et al.  How to Allocate Network Centers , 1993, J. Algorithms.

[17]  Samir Khuller,et al.  The Capacitated K-Center Problem (Extended Abstract) , 1996, ESA.

[18]  David B. Shmoys,et al.  Approximation algorithms for facility location problems , 2000, APPROX.

[19]  Satish Rao,et al.  Approximation schemes for Euclidean k-medians and related problems , 1998, STOC '98.

[20]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[21]  S. L. HAKIMIt AN ALGORITHMIC APPROACH TO NETWORK LOCATION PROBLEMS. , 1979 .

[22]  Fabián A. Chudak,et al.  Improved approximation algorithms for a capacitated facility location problem , 1999, SODA '99.

[23]  O. Kariv,et al.  An Algorithmic Approach to Network Location Problems. II: The p-Medians , 1979 .