K-median Algorithms: Theory in Practice

We define the distance metric as dij for i ∈ {1, . . . , n}, j ∈ {1, . . . , n}, such that dij is the distance between points i and j in the metric space X. Kariv and Hakim [1] proved that finding such k medians in a network is an NP-hard problem by reducing the dominating set problem to it. A simple bruteforce algorithm would examine every possible size-k subset in F , compute the closest facility in this set for every client, and return the best set overall. This brute-force algorithm would run in O (( nf k ) nck ) time, where |F | = nf , |C| = nc. Thus, academic research into this problem has focused primarily on producing good approximation algorithms. For a given algorithm, the approximation ratio is defined as the provably worst possible ratio between the cost of the algorithm’s output and the optimal cost. However, for most problem instances, we do not know the actual optimal cost, and we thus compute the approximation ratio as the total cost returned by the algorithm divided by the optimal value of the relaxed linear program discussed in section 1.2. Jain et al. [2] proved that the k-median problem is 1 + 2e ≈ “1.736”-hard to approximate in a metric space. We note that, throughout this paper, all of our distance metrics satisfy the properties of a metric space.

[1]  O. Kariv,et al.  An Algorithmic Approach to Network Location Problems. II: The p-Medians , 1979 .

[2]  John E. Beasley,et al.  OR-Library: Distributing Test Problems by Electronic Mail , 1990 .

[3]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[4]  Neal E. Young K-medians, facility location, and the Chernoff-Wald bound , 2000, SODA '00.

[5]  Vijay V. Vazirani,et al.  Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation , 2001, JACM.

[6]  Amin Saberi,et al.  A new greedy approach for facility location problems , 2002, STOC '02.

[7]  Evangelos Markakis,et al.  Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP , 2002, JACM.

[8]  Kamesh Munagala,et al.  Local Search Heuristics for k-Median and Facility Location Problems , 2004, SIAM J. Comput..

[9]  Marek Chrobak,et al.  The reverse greedy algorithm for the metric k-median problem , 2005, Inf. Process. Lett..

[10]  Lecture 5: Primal-Dual Algorithms and Facility Location , 2008 .

[11]  P. Rousseeuw,et al.  Partitioning Around Medoids (Program PAM) , 2008 .

[12]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[13]  Shi Li,et al.  A Dependent LP-Rounding Approach for the k-Median Problem , 2012, ICALP.

[14]  Shi Li,et al.  Approximating k-median via pseudo-approximation , 2012, STOC '13.

[15]  David P. Williamson,et al.  An Experimental Evaluation of Incremental and Hierarchical k-Median Algorithms , 2013, JEAL.

[16]  Kurt Mehlhorn,et al.  New Approximability Results for the Robust k-Median Problem , 2013, SWAT.

[17]  Ravishankar Krishnaswamy,et al.  Relax, No Need to Round: Integrality of Clustering Formulations , 2014, ITCS.