Constant approximation for k-median and k-means with outliers via iterative rounding

In this paper, we present a new iterative rounding framework for many clustering problems. Using this, we obtain an (α1 + є ≤ 7.081 + є)-approximation algorithm for k-median with outliers, greatly improving upon the large implicit constant approximation ratio of Chen. For k-means with outliers, we give an (α2+є ≤ 53.002 + є)-approximation, which is the first O(1)-approximation for this problem. The iterative algorithm framework is very versatile; we show how it can be used to give α1- and (α1 + є)-approximation algorithms for matroid and knapsack median problems respectively, improving upon the previous best approximations ratios of 8 due to Swamy and 17.46 due to Byrka et al. The natural LP relaxation for the k-median/k-means with outliers problem has an unbounded integrality gap. In spite of this negative result, our iterative rounding framework shows that we can round an LP solution to an almost-integral solution of small cost, in which we have at most two fractionally open facilities. Thus, the LP integrality gap arises due to the gap between almost-integral and fully-integral solutions. Then, using a pre-processing procedure, we show how to convert an almost-integral solution to a fully-integral solution losing only a constant-factor in the approximation ratio. By further using a sparsification technique, the additive factor loss incurred by the conversion can be reduced to any є > 0.

[1]  Amin Saberi,et al.  A new greedy approach for facility location problems , 2002, STOC '02.

[2]  Philip N. Klein,et al.  The power of local search for clustering , 2016, ArXiv.

[3]  Amit Kumar,et al.  Constant factor approximation algorithm for the knapsack median problem , 2012, SODA.

[4]  Aravind Srinivasan,et al.  An Improved Approximation Algorithm for Knapsack Median Using Sparsification , 2017, Algorithmica.

[5]  Ke Chen,et al.  A constant factor approximation algorithm for k-median clustering with outliers , 2008, SODA '08.

[6]  Aristides Gionis,et al.  k-means-: A Unified Approach to Clustering and Outlier Detection , 2013, SDM.

[7]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[8]  Mohammad Mahdian,et al.  Approximation Algorithms for Metric Facility Location Problems , 2006, SIAM J. Comput..

[9]  Evangelos Markakis,et al.  Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP , 2002, JACM.

[10]  Napat Rujeerapaiboon,et al.  Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization , 2017, SIAM J. Optim..

[11]  Shi Li,et al.  A 1.488 approximation algorithm for the uncapacitated facility location problem , 2011, Inf. Comput..

[12]  Fabio Tozeto Ramos,et al.  On Integrated Clustering and Outlier Detection , 2014, NIPS.

[13]  Sergei Vassilvitskii,et al.  Local Search Methods for k-Means with Outliers , 2017, Proc. VLDB Endow..

[14]  Kamesh Munagala,et al.  Local search heuristic for k-median and facility location problems , 2001, STOC '01.

[15]  Euiwoong Lee,et al.  Improved and simplified inapproximability for k-means , 2015, Inf. Process. Lett..

[16]  Vijay V. Vazirani,et al.  Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation , 2001, JACM.

[17]  David P. Williamson,et al.  The Design of Approximation Algorithms , 2011 .

[18]  Samir Khuller,et al.  Algorithms for facility location problems with outliers , 2001, SODA '01.

[19]  Avrim Blum,et al.  Stability Yields a PTAS for k-Median and k-Means Clustering , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[20]  Amit Kumar,et al.  The matroid median problem , 2011, SODA '11.

[21]  Aravind Srinivasan,et al.  An Improved Approximation for k-Median and Positive Correlation in Budgeted Optimization , 2014, SODA.

[22]  Amit Kumar,et al.  Clustering with Spectral Norm and the k-Means Algorithm , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[23]  Éva Tardos,et al.  Approximation algorithms for facility location problems (extended abstract) , 1997, STOC '97.

[24]  Samir Khuller,et al.  Greedy strikes back: improved facility location algorithms , 1998, SODA '98.

[25]  Bodo Manthey,et al.  Smoothed Analysis of the k-Means Method , 2011, JACM.

[26]  Jeffrey Scott Vitter,et al.  Approximation Algorithms for Geometric Median Problems , 1992, Inf. Process. Lett..

[27]  Rajmohan Rajaraman,et al.  Analysis of a local search heuristic for facility location problems , 2000, SODA '98.

[28]  Mohammad Taghi Hajiaghayi,et al.  Local Search Algorithms for the Red-Blue Median Problem , 2011, Algorithmica.

[29]  Sudipto Guha,et al.  Improved combinatorial algorithms for the facility location and k-median problems , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[30]  Vincent Cohen-Addad,et al.  On the Local Structure of Stable Clustering Instances , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[31]  Satish Rao,et al.  Approximation schemes for Euclidean k-medians and related problems , 1998, STOC '98.

[32]  Ola Svensson,et al.  Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[33]  Shi Li,et al.  A Dependent LP-Rounding Approach for the k-Median Problem , 2012, ICALP.

[34]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[35]  Mohammad R. Salavatipour,et al.  Approximation Schemes for Clustering with Outliers , 2018, SODA.

[36]  Shi Li,et al.  Approximating k-median via pseudo-approximation , 2012, STOC '13.

[37]  Jaroslaw Byrka An Optimal Bifactor Approximation Algorithm for the Metric Uncapacitated Facility Location Problem , 2007, APPROX-RANDOM.

[38]  J SchulmanLeonard,et al.  The effectiveness of lloyd-type methods for the k-means problem , 2013 .

[39]  Chaitanya Swamy,et al.  Improved Approximation Algorithms for Matroid and Knapsack Median Problems and Applications , 2013, APPROX-RANDOM.

[40]  Maria-Florina Balcan,et al.  Clustering under approximation stability , 2013, JACM.

[41]  Fabián A. Chudak,et al.  Improved Approximation Algorithms for the Uncapacitated Facility Location Problem , 2003, SIAM J. Comput..

[42]  Mohammad R. Salavatipour,et al.  Local Search Yields a PTAS for k-Means in Doubling Metrics , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[43]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[44]  Jaroslaw Byrka,et al.  An Optimal Bifactor Approximation Algorithm for the Metric Uncapacitated Facility Location Problem , 2006, SIAM J. Comput..

[45]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.