Fast Clustering with Lower Bounds: No Customer too Far, No Shop too Small

We study the \LowerBoundedCenter (\lbc) problem, which is a clustering problem that can be viewed as a variant of the \kCenter problem. In the \lbc problem, we are given a set of points P in a metric space and a lower bound \lambda, and the goal is to select a set C \subseteq P of centers and an assignment that maps each point in P to a center of C such that each center of C is assigned at least \lambda points. The price of an assignment is the maximum distance between a point and the center it is assigned to, and the goal is to find a set of centers and an assignment of minimum price. We give a constant factor approximation algorithm for the \lbc problem that runs in O(n \log n) time when the input points lie in the d-dimensional Euclidean space R^d, where d is a constant. We also prove that this problem cannot be approximated within a factor of 1.8-\epsilon unless P = \NP even if the input points are points in the Euclidean plane R^2.

[1]  Sariel Har-Peled,et al.  Smaller Coresets for k-Median and k-Means Clustering , 2005, SCG.

[2]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[3]  Tomás Feder,et al.  Optimal algorithms for approximate clustering , 1988, STOC '88.

[4]  David G. Stork,et al.  Pattern Classification , 1973 .

[5]  Amit Kumar,et al.  Linear-time approximation schemes for clustering problems in any dimensions , 2010, JACM.

[6]  Timothy M. Chan On Enumerating and Selecting Distances , 2001, Int. J. Comput. Geom. Appl..

[7]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[8]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[9]  S. Rao Kosaraju,et al.  A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields , 1995, JACM.

[10]  Sariel Har-Peled,et al.  Fast construction of nets in low dimensional metrics, and their applications , 2004, SCG.

[11]  Chaitanya Swamy,et al.  Improved Approximation Guarantees for Lower-Bounded Facility Location , 2011, WAOA.

[12]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[13]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[14]  Ke Chen,et al.  On k-Median clustering in high dimensions , 2006, SODA '06.

[15]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[16]  Satish Rao,et al.  Approximation schemes for Euclidean k-medians and related problems , 1998, STOC '98.

[17]  Zoya Svitkina,et al.  Lower-bounded facility location , 2008, SODA '08.

[18]  Kamesh Munagala,et al.  Local search heuristic for k-median and facility location problems , 2001, STOC '01.

[19]  Sariel Har-Peled Clustering Motion , 2004, Discret. Comput. Geom..

[20]  Ke Chen,et al.  A constant factor approximation algorithm for k-median clustering with outliers , 2008, SODA '08.

[21]  Bodo Manthey,et al.  Smoothed Analysis of the k-Means Method , 2011, JACM.

[22]  Sariel Har-Peled Geometric Approximation Algorithms , 2011 .

[23]  Sariel Har-Peled,et al.  Approximating the Fréchet Distance for Realistic Curves in Near Linear Time , 2012, Discret. Comput. Geom..

[24]  Sariel Har-Peled,et al.  How Fast Is the k-Means Method? , 2005, SODA '05.