Accelerating DNA sequencing-by-hybridization with noise

As a potential alternative to current wet-lab technologies, DNA sequencing-by-hybridization (SBH) has received much attention from different research communities. In order to deal with real applications, experiment environments should not be considered as error-free. Previously, under the assumption of random independent hybridization errors, Leong et al. [9] presented an algorithm for sequence reconstruction which exhibits graceful degradation of output accuracy as the error rate increases. However, as the authors also admitted, a notable downside of their method is its too high computational cost. In this paper, we show that the poor efficiency of [9] is due to its mixing-up of situations with widely different characteristics and treating everything in the safest but also slowest way. Our new algorithm addresses this problem and pushes analysis down to a finer level where a more effective solution is proposed. As demonstrated by experimentations on real human genome datasets, this new methodology yields significant performance improvements and at the same time guarantees almost the same degree of output accuracy.

[1]  Franco P. Preparata,et al.  Sequencing-by-hybridization revisited: the analog-spectrum proposal , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  R. Drmanac,et al.  An algorithm for the DNA sequence generation from k-tuple word contents of the minimal number of random fragments. , 1991, Journal of biomolecular structure & dynamics.

[3]  Craig A. Stewart,et al.  Introduction to computational biology , 2005 .

[4]  K. Doi,et al.  Sequencing by hybridization in the presence of hybridization errors. , 2000, Genome informatics. Workshop on Genome Informatics.

[5]  Franco P. Preparata,et al.  Enhanced Sequence Reconstruction with DNA Microarray Application , 2001, COCOON.

[6]  W. Bains,et al.  A novel method for nucleic acid sequence determination. , 1988, Journal of theoretical biology.

[7]  P. Pevzner 1-Tuple DNA sequencing: computer analysis. , 1989, Journal of biomolecular structure & dynamics.

[8]  P. Pevzner,et al.  Improved chips for sequencing by hybridization. , 1991, Journal of biomolecular structure & dynamics.

[9]  Martin E. Dyer,et al.  The Probability of Unique Solutions of Sequencing by Hybridization , 1994, J. Comput. Biol..

[10]  Eli Upfal,et al.  Sequencing-by-Hybridization at the Information-Theory Bound: An Optimal Algorithm , 2000, J. Comput. Biol..

[11]  Gérard Vergoten,et al.  Biomolecular structure and dynamics , 1997 .

[12]  Paul Schliekelman,et al.  Statistical Methods in Bioinformatics: An Introduction , 2001 .

[13]  R. Lipshutz,et al.  Likelihood DNA sequencing by hybridization. , 1993, Journal of biomolecular structure & dynamics.

[14]  Franco P. Preparata,et al.  On the Control of Hybridization Noise in DNA Sequencing-by-Hybridization , 2002, WABI.

[15]  R. Drmanac,et al.  Sequencing of megabase plus DNA by hybridization: theory of the method. , 1989, Genomics.

[16]  D. M. Brown,et al.  5-Nitroindole as an universal base analogue. , 1994, Nucleic acids research.

[17]  Gregory R. Grant,et al.  Statistical Methods in Bioinformatics , 2001 .

[18]  Alan M. Frieze,et al.  On the power of universal bases in sequencing by hybridization , 1999, RECOMB.

[19]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[20]  Pavel A. Pevzner,et al.  Towards DNA Sequencing Chips , 1994, MFCS.

[21]  David Martin,et al.  Computational Molecular Biology: An Algorithmic Approach , 2001 .