We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by varrho. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S. We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits straightforward polynomial-time solutions when k=O(logn) and p=O(1), or when p=2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any var epsilon>0 it is NP-hard to split S into at most pk1/7−var epsilon clusters whose Hamming diameter does not exceed the p-diameter, and that solving HDC exactly is an NP-complete problem already for p=3. Furthermore, we note that by adapting Gonzalez' farthest-point clustering algorithm [T. Gonzalez, Theoret. Comput. Sci. 38 (1985) 293–306], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(pvarrho/var epsilon)kO(p/var epsilon)n2-time (1+var epsilon)-approximation algorithm for HRC. In particular, it runs in polynomial time when p=O(1) and varrho=O(log(k+n)). Finally, we show how to find in Image time a set L of O(plogk) strings of length n such that for each string in S there is at least one string in L within distance (1+var epsilon)varrho, for any constant 0
[1]
Andrzej Lingas,et al.
Efficient approximation algorithms for the Hamming center problem
,
1999,
SODA '99.
[2]
Bernard Kolman,et al.
Discrete Mathematical Structures
,
1984
.
[3]
Bin Ma,et al.
Distinguishing string selection problems
,
2003,
SODA '99.
[4]
David S. Johnson,et al.
Computers and Intractability: A Guide to the Theory of NP-Completeness
,
1978
.
[5]
Mihir Bellare,et al.
Free bits, PCPs and non-approximability-towards tight results
,
1995,
Proceedings of IEEE 36th Annual Foundations of Computer Science.
[6]
Bin Ma,et al.
Finding similar regions in many strings
,
1999,
STOC '99.
[7]
Rina Panigrahy,et al.
An O(log*n) approximation algorithm for the asymmetric p-center problem
,
1996,
SODA '96.
[8]
Teofilo F. GONZALEZ,et al.
Clustering to Minimize the Maximum Intercluster Distance
,
1985,
Theor. Comput. Sci..
[9]
Dan Gusfield,et al.
Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology
,
1997
.
[10]
David B. Shmoys,et al.
A Best Possible Heuristic for the k-Center Problem
,
1985,
Math. Oper. Res..
[11]
Christos H. Papadimitriou,et al.
On the complexity of integer programming
,
1981,
JACM.
[12]
Tomás Feder,et al.
Optimal algorithms for approximate clustering
,
1988,
STOC '88.
[13]
Dorit S. Hochbaum,et al.
Approximation Algorithms for NP-Hard Problems
,
1996
.
[14]
David B. Shmoys,et al.
A unified approach to approximation algorithms for bottleneck problems
,
1986,
JACM.
[15]
Mihir Bellare,et al.
Free Bits, PCPs, and Nonapproximability-Towards Tight Results
,
1998,
SIAM J. Comput..