Bounds for Resquencing by Hybridization

We study the problem of finding the sequence of an unknown DNA fragment given the set of its k-long subsequences and a homologous sequence, namely a sequence that is similar to the target sequence. Such a sequence is available in some applications, e.g., when detecting single nucleotide polymorphisms. Pe’er and Shamir studied this problem and presented a heuristic algorithm for it. In this paper, we give an algorithm with provable performance: We show that under some assumptions, the algorithm can reconstruct a random sequence of length O(4 k ) with high probability. We also show that no algorithm can reconstruct sequences of length Ω(logk·4 k ).

[1]  Franco P. Preparata,et al.  On the Control of Hybridization Noise in DNA Sequencing-by-Hybridization , 2002, WABI.

[2]  R. Drmanac,et al.  Sequencing of megabase plus DNA by hybridization: theory of the method. , 1989, Genomics.

[3]  Jacek Blazewicz,et al.  DNA Sequencing With Positive and Negative Errors , 1999, J. Comput. Biol..

[4]  Roded Sharan,et al.  On the Complexity of Positional Sequencing by Hybridization , 1999, CPM.

[5]  A D Mirzabekov,et al.  [DNA sequencing by hybridization with oligonucleotides immobilized in a gel. Chemical ligation as a method of expanding the prospects for the method]. , 1994, Molekuliarnaia biologiia.

[6]  Dekel Tsur,et al.  Sequencing by hybridization in few rounds , 2003, J. Comput. Syst. Sci..

[7]  C R Cantor,et al.  Enhanced DNA sequencing by hybridization. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Franco P. Preparata,et al.  Enhanced Sequence Reconstruction with DNA Microarray Application , 2001, COCOON.

[9]  Ron Shamir,et al.  Large Scale Sequencing by Hybridization , 2002, J. Comput. Biol..

[10]  Steven Skiena,et al.  Positional sequencing by hybridization , 1996, Comput. Appl. Biosci..

[11]  Eli Upfal,et al.  Sequencing-by-Hybridization at the Information-Theory Bound: An Optimal Algorithm , 2000, J. Comput. Biol..

[12]  W. Bains,et al.  A novel method for nucleic acid sequence determination. , 1988, Journal of theoretical biology.

[13]  Alan M. Frieze,et al.  Optimal Reconstruction of a Sequence from its Probes , 1999, J. Comput. Biol..

[14]  Alan M. Frieze,et al.  Optimal Sequencing by Hybridization in Rounds , 2002, J. Comput. Biol..

[15]  Ron Shamir,et al.  Spectrum Alignment: Efficient Resequencing by Hybridization , 2000, ISMB.

[16]  P. Pevzner 1-Tuple DNA sequencing: computer analysis. , 1989, Journal of biomolecular structure & dynamics.

[17]  Jacek Blazewicz,et al.  Tabu search for DNA sequencing with false negatives and false positives , 2000, Eur. J. Oper. Res..

[18]  Janusz Kaczmarek,et al.  Sequential and parallel algorithms for DNA sequencing , 1997, Comput. Appl. Biosci..

[19]  J. Stachowicz,et al.  Linking climate change and biological invasions: Ocean warming facilitates nonindigenous species invasions , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Eran Halperin,et al.  Handling Long Targets and Errors in Sequencing by Hybridization , 2003, J. Comput. Biol..

[21]  Steven Skiena,et al.  Reconstructing strings from substrings in rounds , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[22]  R. Lipshutz,et al.  Likelihood DNA sequencing by hybridization. , 1993, Journal of biomolecular structure & dynamics.

[23]  Jacek Blazewicz,et al.  A heuristic managing errors for DNA sequencing , 2002, Bioinform..

[24]  Gesine Reinert,et al.  Poisson Process Approximation for Sequence Repeats and Sequencing by Hybridization , 1996, J. Comput. Biol..

[25]  Franco P. Preparata,et al.  Sequencing by hybridization using direct and reverse cooperating spectra , 2002, RECOMB '02.

[26]  Jacek Blazewicz,et al.  Hybrid Genetic Algorithm for DNA Sequencing with Errors , 2002, J. Heuristics.

[27]  Sagi Snir,et al.  Using Restriction Enzymes to Improve Sequencing by Hybridization , 2002 .

[28]  P. Pevzner,et al.  Improved chips for sequencing by hybridization. , 1991, Journal of biomolecular structure & dynamics.

[29]  Martin E. Dyer,et al.  The Probability of Unique Solutions of Sequencing by Hybridization , 1994, J. Comput. Biol..

[30]  Ron Shamir,et al.  A computational method for resequencing long DNA targets by universal oligonucleotide arrays , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Dekel Tsur Sequencing by hybridization with errors: handling longer sequences , 2005, Theor. Comput. Sci..

[32]  Steven Skiena,et al.  Reconstructing Strings from Substrings , 1995, J. Comput. Biol..