Approximation Algorithms for Clustering to Minimize the Sum of Diameters

We consider the problem of partitioning the n nodes of a complete edge weighted graph into k clusters so as to minimize the sum of the diameters of the clusters. Since the problem is NP-complete, our focus is on the development of good approximation algorithms. When edge weights satisfy the triangle inequality, we present the first approximation algorithm for the problem. The approximation algorithm yields a solution which has no more than O(k) clusters such that the sum of cluster diameters is within a factor O(ln (n/k)) of the optimal value using exactly k clusters. Our approach also permits a tradeoff among the constant terms hidden by the two big-O terms and the running time. For any fixed k, we present an approximation algorithm that produces k clusters whose total diameter is at most twice the optimal value. When the distances are not required to satisfy the triangle inequality, we show that, unless P = NP, for any ρ ≥ 1, there is no polynomial time approximation algorithm that can provide a performance guarantee of ρ even when the number of clusters is fixed at 3. We also present some results for the problem of minimizing the sum of cluster radii.

[1]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  P. Brucker On the Complexity of Clustering Problems , 1978 .

[3]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[4]  Robert J. Fowler,et al.  Optimal Packing and Covering in the Plane are NP-Complete , 1981, Inf. Process. Lett..

[5]  J. Plesník Complexity of decomposing graphs into factors with given diameters or radii , 1982 .

[6]  Nimrod Megiddo,et al.  On the Complexity of Some Common Geometric Location Problems , 1984, SIAM J. Comput..

[7]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[8]  David B. Shmoys,et al.  A unified approach to approximation algorithms for bottleneck problems , 1986, JACM.

[9]  B. Jaumard,et al.  Minimum sum of diameters clustering , 1987 .

[10]  Tomás Feder,et al.  Optimal algorithms for approximate clustering , 1988, STOC '88.

[11]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[12]  Gerhard J. Woeginger,et al.  Geometric Clusterings , 1991, J. Algorithms.

[13]  Gerhard J. Woeginger,et al.  Some Geometric Clustering Problems , 1994, Nord. J. Comput..

[14]  Amitava Datta,et al.  Efficient Parallel Algorithms for Geometric k-Clustering Problems , 1994, STACS.

[15]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[16]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[17]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[18]  Prabhakar Raghavan,et al.  Information retrieval algorithms: a survey , 1997, SODA '97.

[19]  Pierre Hansen,et al.  Cluster analysis and mathematical programming , 1997, Math. Program..

[20]  Jitender S. Deogun,et al.  An Approximation Algorithm for Clustering Graphs with Dominating Diametral Path , 1997, Inf. Process. Lett..

[21]  Dorit S. Hochba,et al.  Approximation Algorithms for NP-Hard Problems , 1997, SIGA.

[22]  Refael Hassin,et al.  Approximation Algorithms for Min-sum p-clustering , 1998, Discret. Appl. Math..

[23]  Pankaj K. Agarwal,et al.  Exact and Approximation Algortihms for Clustering , 1997 .

[24]  R. Ravi,et al.  Bicriteria Network Design Problems , 1998, J. Algorithms.

[25]  Randeep Bhatia,et al.  Book review: Approximation Algorithms for NP-hard Problems. Edited by Dorit S. Hochbaum (PWS, 1997) , 1998, SIGA.

[26]  Lenore Cowen,et al.  Near-Linear Time Construction of Sparse Neighborhood Covers , 1999, SIAM J. Comput..

[27]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[28]  J. Matou On Approximate Geometric K-clustering , 1999 .

[29]  Piotr Indyk A sublinear time approximation scheme for clustering in metric spaces , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[30]  Jirí Matousek,et al.  On Approximate Geometric k -Clustering , 2000, Discret. Comput. Geom..

[31]  H. Mortveit,et al.  APPROXIMATION ALGORITHMS FOR CLUSTERING TO MINIMIZE THE SUM OF DIAMETERS , 2000 .

[32]  Pankaj K. Agarwal,et al.  Approximation algorithms for projective clustering , 2000, SODA '00.