Adaptive Control of Hybridization Noise in Dna Sequencing-by-hybridization

We consider the problem of sequence reconstruction in sequencing-by-hybridization in the presence of spectrum errors. As suggested by intuition, and reported in the literature, false-negatives (i.e., missing spectrum probes) are by far the leading cause of reconstruction failures. In a recent paper we have described an algorithm, called "threshold-theta", designed to recover from false negatives. This algorithm is based on overcompensating for missing extensions by allowing larger reconstruction subtrees. We demonstrated, both analytically and with simulations, the increasing effectiveness of the approach as the parameter theta grows, but also pointed out that for larger error rates the size of the extension trees translates into an unacceptable computational burden. To obviate this shortcoming, in this paper we propose an adaptive approach which is both effective and efficient. Effective, because for a fixed value of theta it performs as well as its single-threshold counterpart, efficient because it exhibits substantial speed-ups over it. The idea is that, for moderate error rates a small fraction of the target sequence can be involved in error recovery; thus, expectedly the remainder of the sequence is reconstructible by the standard noiseless algorithm, with the provision to switch to operation with increasingly higher thresholds after detecting failure. This policy generates interesting and complex interplays between fooling probes and false negatives. These phenomena are carefully analyzed for random sequences and the results are found to be in excellent agreement with the simulations. In addition, the experimental algorithmic speed-ups of the multithreshold approach are explained in terms of the interaction amongst the different threshold regimes.

[1]  Eran Halperin,et al.  Handling long targets and errors in sequencing by hybridization , 2002, RECOMB '02.

[2]  P. Pevzner,et al.  Improved chips for sequencing by hybridization. , 1991, Journal of biomolecular structure & dynamics.

[3]  Franco P. Preparata,et al.  On the Control of Hybridization Noise in DNA Sequencing-by-Hybridization , 2002, WABI.

[4]  Jacek Blazewicz,et al.  DNA Sequencing With Positive and Negative Errors , 1999, J. Comput. Biol..

[5]  W. Bains,et al.  A novel method for nucleic acid sequence determination. , 1988, Journal of theoretical biology.

[6]  R. Drmanac,et al.  Sequencing of megabase plus DNA by hybridization: theory of the method. , 1989, Genomics.

[7]  K. Doi,et al.  Sequencing by hybridization in the presence of hybridization errors. , 2000, Genome informatics. Workshop on Genome Informatics.

[8]  Eli Upfal,et al.  Sequencing-by-hybridization at the information-theory bound: an optimal algorithm , 2000, RECOMB '00.

[9]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[10]  Alan M. Frieze,et al.  On the power of universal bases in sequencing by hybridization , 1999, RECOMB.

[11]  Michael Harpham December , 1855, The Hospital.

[12]  P. Pevzner 1-Tuple DNA sequencing: computer analysis. , 1989, Journal of biomolecular structure & dynamics.

[13]  Craig A. Stewart,et al.  Introduction to computational biology , 2005 .

[14]  R. Lipshutz,et al.  Likelihood DNA sequencing by hybridization. , 1993, Journal of biomolecular structure & dynamics.

[15]  R. Drmanac,et al.  An algorithm for the DNA sequence generation from k-tuple word contents of the minimal number of random fragments. , 1991, Journal of biomolecular structure & dynamics.

[16]  Nicolas Le Nov MELTING, computing the melting temperature of nucleic acid duplex , 2001 .

[17]  J. SantaLucia,et al.  A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Franco P. Preparata,et al.  Enhanced Sequence Reconstruction with DNA Microarray Application , 2001, COCOON.