Approximate clustering of incomplete fingerprints

We study the problem of clustering fingerprints with at most p missing values (CMV(p) for short) naturally arising in oligonucleotide fingerprinting, which is an efficient method for characterizing DNA clone libraries. We show that already CMV(2) is NP-hard. We also show that a greedy algorithm yields a min(1+lnn,2+plnl) approximation for CMV(p), and can be implemented to run in O(nl2^p) time. We also introduce other variants of the problem of clustering incomplete fingerprints based on slightly different optimization criteria and show that they can be approximated in polynomial time with ratios 2^2^p^-^1 and 2(1-12^2^p), respectively.