论文信息 - Approximate clustering of incomplete fingerprints

Approximate clustering of incomplete fingerprints

We study the problem of clustering fingerprints with at most p missing values (CMV(p) for short) naturally arising in oligonucleotide fingerprinting, which is an efficient method for characterizing DNA clone libraries. We show that already CMV(2) is NP-hard. We also show that a greedy algorithm yields a min(1+lnn,2+plnl) approximation for CMV(p), and can be implemented to run in O(nl2^p) time. We also introduce other variants of the problem of clustering incomplete fingerprints based on slightly different optimization criteria and show that they can be approximated in polynomial time with ratios 2^2^p^-^1 and 2(1-12^2^p), respectively.

[1] David S. Johnson,et al. Approximation algorithms for combinatorial problems , 1973, STOC.

[2] R Herwig,et al. Comparative gene expression profiling by oligonucleotide fingerprinting. , 1998, Nucleic acids research.

[3] Luca Trevisan,et al. Positive Linear Programming, Parallel Approximation and PCP's , 1996, ESA.

[4] James Borneman,et al. Oligonucleotide Fingerprinting of rRNA Genes for Analysis of Fungal Community Composition , 2002, Applied and Environmental Microbiology.

[5] C. Müller,et al. Large-scale clustering of cDNA-fingerprinting data. , 1999, Genome research.

[6] Avrim Blum,et al. Correlation Clustering , 2004, Machine Learning.

[7] R. Drmanac,et al. Gene-representing cDNA clusters defined by hybridization of 57,419 clones from infant brain libraries with short oligonucleotide probes. , 1996, Genomics.

[8] M. Chrobak,et al. Analysis of Bacterial Community Composition by Oligonucleotide Fingerprinting of rRNA Genes , 2002, Applied and Environmental Microbiology.

[9] Dimitris Bertsimas,et al. On dependent randomized rounding algorithms , 1999, Oper. Res. Lett..

[10] Tao Jiang,et al. Clustering Binary Fingerprint Vectors with Missing Values for DNA Array Data Analysis , 2004, J. Comput. Biol..