An Improved Approximation Algorithm for the k-Means Problem with Penalties

The clustering problem has been paid lots of attention in various fields of compute science. However, in many applications, the existence of noisy data poses a big challenge for the clustering problem. As one way to deal with clustering problem with noisy data, clustering with penalties has been studied extensively, such as the k-median problem with penalties and the facility location problem with penalties. As far as we know, there is only one approximation algorithm for the k-means problem with penalties with ratio \(25+\epsilon \). All the previous related results for the clustering with penalties problems were based on the techniques of local search, LP-rounding, or primal-dual, which cannot be applied directly to the k-means problem with penalties to get better approximation ratio than \(25+\epsilon \). In this paper, we apply primal-dual technique to solve the k-means problem with penalties by a different rounding method, i.e., employing a deterministic rounding algorithm, instead of using the randomized rounding algorithm used in the previous approximation schemes. Based on the above method, an approximation algorithm with ratio \(19.849+\epsilon \) is presented for the k-means problem with penalties.

[1]  Ola Svensson,et al.  Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[2]  Shi Li,et al.  Constant approximation for k-median and k-means with outliers via iterative rounding , 2017, STOC.

[3]  Jian Li,et al.  Epsilon-Coresets for Clustering (with Outliers) in Doubling Metrics , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[4]  Shi Li,et al.  Distributed k-Clustering for Data with Heavy Noise , 2018, NeurIPS.

[5]  Sergei Vassilvitskii,et al.  Local Search Methods for k-Means with Outliers , 2017, Proc. VLDB Endow..

[6]  Maxim Sviridenko,et al.  A Bi-Criteria Approximation Algorithm for k-Means , 2015, APPROX-RANDOM.

[7]  Philip N. Klein,et al.  Local Search Yields Approximation Schemes for k-Means and k-Median in Euclidean and Minor-Free Metrics , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[8]  Shi Li,et al.  Approximating k-Median via Pseudo-Approximation , 2016, SIAM J. Comput..

[9]  Jinhui Xu,et al.  An improved approximation algorithm for uncapacitated facility location problem with penalties , 2005, J. Comb. Optim..

[10]  Ke Chen,et al.  A constant factor approximation algorithm for k-median clustering with outliers , 2008, SODA '08.

[11]  Dachuan Xu,et al.  An approximation algorithm for the k-median problem with uniform penalties via pseudo-solution , 2018, Theor. Comput. Sci..

[12]  Mohammad R. Salavatipour,et al.  Approximation Schemes for Clustering with Outliers , 2017, SODA.

[13]  Meena Mahajan,et al.  The planar k-means problem is NP-hard , 2012, Theor. Comput. Sci..

[14]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[15]  Mohammad R. Salavatipour,et al.  Local Search Yields a PTAS for k-Means in Doubling Metrics , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[16]  Jirí Matousek,et al.  On Approximate Geometric k -Clustering , 2000, Discret. Comput. Geom..

[17]  Aravind Srinivasan,et al.  An Improved Approximation for k-Median and Positive Correlation in Budgeted Optimization , 2014, SODA.

[18]  Sudipto Guha,et al.  Distributed Partial Clustering , 2017, SPAA.

[19]  Yu Li,et al.  Improved Approximation Algorithms for the Facility Location Problems with Linear/Submodular Penalties , 2014, Algorithmica.

[20]  Samir Khuller,et al.  Algorithms for facility location problems with outliers , 2001, SODA '01.

[21]  Dongmei Zhang,et al.  Local search approximation algorithms for the k-means problem with penalties , 2019, J. Comb. Optim..

[22]  Jinhui Xu,et al.  An LP rounding algorithm for approximating uncapacitated facility location problem with penalties , 2005, Inf. Process. Lett..

[23]  Evangelos Markakis,et al.  Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP , 2002, JACM.

[24]  Amit Kumar,et al.  Linear-time approximation schemes for clustering problems in any dimensions , 2010, JACM.

[25]  Dongmei Zhang,et al.  A Local Search Approximation Algorithm for the k-means Problem with Penalties , 2017, COCOON.

[26]  Dan Feldman,et al.  Data reduction for weighted and outlier-resistant clustering , 2012, SODA.

[27]  Vijay V. Vazirani,et al.  Approximation algorithms for metric facility location and k-Median problems using the primal-dual schema and Lagrangian relaxation , 2001, JACM.

[28]  Sunil Arya,et al.  Space-time tradeoffs for approximate nearest neighbor searching , 2009, JACM.

[29]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[30]  Mohammad Taghi Hajiaghayi,et al.  Local Search Algorithms for the Red-Blue Median Problem , 2011, Algorithmica.

[31]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.