Exact and Approximation Algortihms for Clustering

In this paper we present an n^ O(k1-1/d) -time algorithm for solving the k -center problem in \realsd , under L∈fty - and L2 -metrics. The algorithm extends to other metrics, and to the discrete k -center problem. We also describe a simple (1+ɛ) -approximation algorithm for the k -center problem, with running time O(nlog  k) + (k/ɛ)^ O(k1-1/d) . Finally, we present an n^ O(k1-1/d) -time algorithm for solving the L -capacitated k -center problem, provided that L=Ω(n/k1-1/d) or L=O(1) .

[1]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[2]  Nimrod Megiddo,et al.  On the Complexity of Some Common Geometric Location Problems , 1984, SIAM J. Comput..

[3]  S. Sudarshan,et al.  Clustering Techniques for Minimizing External Path Length , 1996, VLDB.

[4]  Judit Bar-Ilan,et al.  How to Allocate Network Centers , 1993, J. Algorithms.

[5]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[6]  Samir Khuller,et al.  The Capacitated K-Center Problem , 2000, SIAM J. Discret. Math..

[7]  Jean-Michel Jolion,et al.  Robust Clustering with Applications in Computer Vision , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Micha Sharir,et al.  Efficient algorithms for geometric optimization , 1998, CSUR.

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  Josef Bigün,et al.  Hierarchical image segmentation by multi-dimensional clustering and orientation-adaptive boundary refinement , 1995, Pattern Recognit..

[11]  Isidore Rigoutsos,et al.  An algorithm for point clustering and grid generation , 1991, IEEE Trans. Syst. Man Cybern..

[12]  Vijay V. Vazirani,et al.  Primal-dual approximation algorithms for metric facility location and k-median problems , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[13]  N. S. Barnett,et al.  Private communication , 1969 .

[14]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[15]  Zvi Drezner,et al.  The p-Centre Problem—Heuristic and Optimal Algorithms , 1984 .

[16]  Sanjeev Arora,et al.  Nearly Linear Time Approximation Schemes for Euclidean TSP and Other Geometric Problems , 1997, RANDOM.

[17]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[18]  Sanjeev Arora,et al.  Polynomial time approximation schemes for Euclidean TSP and other geometric problems , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[19]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[20]  Tomás Feder,et al.  Optimal algorithms for approximate clustering , 1988, STOC '88.

[21]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[22]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[23]  David B. Shmoys,et al.  Approximation algorithms for facility location problems , 2000, APPROX.

[24]  R. Ng,et al.  Eecient and Eeective Clustering Methods for Spatial Data Mining , 1994 .

[25]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[26]  M. Sharir,et al.  E cient Algorithms for Geometric Optimization , 1998 .

[27]  Teofilo F. Gonzalez,et al.  Covering a Set of Points in Multidimensional Space , 1991, Inf. Process. Lett..

[28]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[29]  Neal E. Young,et al.  Data collection for the Sloan Digital Sky Survey—a network-flow heuristic , 1996, SODA '96.

[30]  Prabhakar Raghavan,et al.  Information retrieval algorithms: a survey , 1997, SODA '97.

[31]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[32]  Robert J. Fowler,et al.  Optimal Packing and Covering in the Plane are NP-Complete , 1981, Inf. Process. Lett..