Min Sum Clustering with Penalties

Traditionally, clustering problems are investigated under the assumption that all objects must be clustered. A shortcoming of this formulation is that a few distant objects, called outliers, may exert a disproportionately strong influence over the solution. In this work we investigate the k-min-sum clustering problem while addressing outliers in a meaningful way. Given a complete graph G = (V,E), a weight function w : E →IN0 on its edges, and $p \rightarrow {\it {IN}_{o}}$ a penalty function on its nodes, the penalized k-min-sum problem is the problem of finding a partition of V to k+1 sets, {S1,...,Sk+1}, minimizing $\sum_{i=1}^{k}$w(Si)+p(Sk+1), where for S⊆Vw(S) = $\sum_{e=\{{\it i},{\it j}\} \subset {\it S}}$we, and p(S) = $\sum_{i \in S}{^p_i}$. We offer an efficient 2-approximation to the penalized 1-min-sum problem using a primal-dual algorithm. We prove that the penalized 1-min-sum problem is NP-hard even if w is a metric and present a randomized approximation scheme for it. For the metric penalized k-min-sum problem we offer a 2-approximation.

[1]  Marek Karpinski,et al.  Approximation schemes for clustering problems , 2003, STOC '03.

[2]  Dana Ron,et al.  Property testing and its connection to learning and approximation , 1998, JACM.

[3]  Refael Hassin,et al.  The minimum generalized vertex cover problem , 2003, TALG.

[4]  Richard P. Anstee A Polynomial Algorithm for b-Matchings: An Alternative Approach , 1987, Inf. Process. Lett..

[5]  Dorit S. Hochbaum,et al.  Solving integer programs over monotone inequalities in three variables: A framework for half integrality and good approximations , 2002, Eur. J. Oper. Res..

[6]  Samir Khuller,et al.  Algorithms for facility location problems with outliers , 2001, SODA '01.

[7]  Refael Hassin,et al.  Approximation algorithms for maximum dispersion , 1997, Oper. Res. Lett..

[8]  Subhash Khot,et al.  Vertex cover might be hard to approximate to within 2-/spl epsiv/ , 2003, 18th IEEE Annual Conference on Computational Complexity, 2003. Proceedings..

[9]  Teofilo F. Gonzalez,et al.  P-Complete Approximation Problems , 1976, J. ACM.

[10]  Uriel Feige,et al.  The Dense k -Subgraph Problem , 2001, Algorithmica.

[11]  Marek Karpinski,et al.  Approximation schemes for Metric Bisection and partitioning , 2004, SODA '04.

[12]  Piotr Indyk A sublinear time approximation scheme for clustering in metric spaces , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[13]  Claire Mathieu,et al.  A Randomized Approximation Scheme for Metric MAX-CUT , 2001, J. Comput. Syst. Sci..

[14]  Refael Hassin,et al.  Approximation Algorithms for Min-sum p-clustering , 1998, Discret. Appl. Math..

[15]  Jinhui Xu,et al.  An LP rounding algorithm for approximating uncapacitated facility location problem with penalties , 2005, Inf. Process. Lett..

[16]  Mihalis Yannakakis,et al.  Multiway Cuts in Directed and Node Weighted Graphs , 1994, ICALP.

[17]  S. Safra,et al.  On the hardness of approximating minimum vertex cover , 2005 .

[18]  Michael Randolph Garey,et al.  Johnson: "computers and intractability , 1979 .