Accurate Reconstruction for DNA Sequencing by Hybridization Based on a Constructive Heuristic

Sequencing by hybridization is a promising cost-effective technology for high-throughput DNA sequencing via microarray chips. However, due to the effects of spectrum errors rooted in experimental conditions, an accurate and fast reconstruction of original sequences has become a challenging problem. In the last decade, a variety of analyses and designs have been tried to overcome this problem, where different strategies have different trade-offs in speed and accuracy. Motivated by the idea that the errors could be identified by analyzing the interrelation of spectrum elements, this paper presents a constructive heuristic algorithm, featuring an accurate reconstruction guided by a set of well-defined criteria and rules. Instead of directly reconstructing the original sequence, the new algorithm first builds several accurate short fragments, which are then carefully assembled into a whole sequence. The experiments on benchmark instance sets demonstrate that the proposed method can reconstruct long DNA sequences with higher accuracy than current approaches in the literature.

[1]  Andrei Tchernykh,et al.  Sequencing by hybridization: an enhanced crossover operator for a hybrid genetic algorithm , 2007, J. Heuristics.

[2]  P. Pevzner 1-Tuple DNA sequencing: computer analysis. , 1989, Journal of biomolecular structure & dynamics.

[3]  S. Anderson,et al.  Shotgun DNA sequencing using cloned DNase I-generated fragments , 1981, Nucleic Acids Res..

[4]  Xiang-Sun Zhang,et al.  Reconstruction of DNA sequencing by hybridization , 2003, Bioinform..

[5]  P. Lizardi,et al.  Next-generation sequencing-by-hybridization , 2008, Nature Biotechnology.

[6]  Christian Blum,et al.  An ant colony optimization algorithm for DNA sequencing by hybridization , 2008, Comput. Oper. Res..

[7]  Jacek Blazewicz,et al.  Complexity of DNA sequencing by hybridization , 2003, Theor. Comput. Sci..

[8]  Jacek Blazewicz,et al.  Sequencing by hybridization with isothermic oligonucleotide libraries , 2004, Discret. Appl. Math..

[9]  R. Drmanac,et al.  Sequencing of megabase plus DNA by hybridization: theory of the method. , 1989, Genomics.

[10]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[11]  George M. Church,et al.  Genomes for all. , 2006, Scientific American.

[12]  Franco P. Preparata,et al.  Sequencing-by-hybridization revisited: the analog-spectrum proposal , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Jacek Blazewicz,et al.  A heuristic managing errors for DNA sequencing , 2002, Bioinform..

[14]  Maitreya J. Dunham,et al.  Comparing whole genomes using DNA microarrays , 2008, Nature Reviews Genetics.

[15]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Fred W. Glover,et al.  Evolutionary Approaches to DNA Sequencing with Errors , 2005, Ann. Oper. Res..

[17]  Janusz Kaczmarek,et al.  Sequential and parallel algorithms for DNA sequencing , 1997, Comput. Appl. Biosci..

[18]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[19]  Christian Blum,et al.  New Constructive Heuristics for DNA Sequencing by Hybridization , 2006, WABI.

[20]  Jacek Blazewicz,et al.  DNA Sequencing With Positive and Negative Errors , 1999, J. Comput. Biol..

[21]  Takaho A. Endo,et al.  Probabilistic nucleotide assembling method for sequencing by hybridization , 2004, Bioinform..

[22]  Alan M. Frieze,et al.  Optimal Reconstruction of a Sequence from its Probes , 1999, J. Comput. Biol..

[23]  R. Lipshutz,et al.  Likelihood DNA sequencing by hybridization. , 1993, Journal of biomolecular structure & dynamics.