eSBH: An Accurate Constructive Heuristic Algorithm for DNA Sequencing by Hybridization

Sequencing by hybridization is a promising cost-effective technology for high-throughput DNA sequencing via microarray chips. However, due to the effects of spectrum errors rooted from experimental conditions, a fast and accurate reconstruction of original sequences has become a challenging problem. In the last decade, a variety of analyses and designs have been tried to overcome this problem, where different strategies have different tradeoffs in speed and accuracy. Motivated by the idea that the errors could be identified by analyzing the interrelation of spectrum elements, this paper presents a new constructive heuristic algorithm, featuring an accurate reconstruction guided by a set of well-defined criteria and rules. The experiments on benchmark instance sets demonstrate that the proposed method can reconstruct long DNA sequences more accurately than current approaches in the literature.

[1]  P. Lizardi,et al.  Next-generation sequencing-by-hybridization , 2008, Nature Biotechnology.

[2]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[3]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[4]  Jacek Blazewicz,et al.  Sequencing by hybridization with isothermic oligonucleotide libraries , 2004, Discret. Appl. Math..

[5]  George M. Church,et al.  Genomes for all. , 2006, Scientific American.

[6]  Fred W. Glover,et al.  Evolutionary Approaches to DNA Sequencing with Errors , 2005, Ann. Oper. Res..

[7]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[8]  Janusz Kaczmarek,et al.  Sequential and parallel algorithms for DNA sequencing , 1997, Comput. Appl. Biosci..

[9]  R. Drmanac,et al.  Sequencing of megabase plus DNA by hybridization: theory of the method. , 1989, Genomics.

[10]  Franco P. Preparata,et al.  Sequencing-by-hybridization revisited: the analog-spectrum proposal , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Christian Blum,et al.  An ant colony optimization algorithm for DNA sequencing by hybridization , 2008, Comput. Oper. Res..

[12]  Jacek Blazewicz,et al.  A heuristic managing errors for DNA sequencing , 2002, Bioinform..

[13]  R. Lipshutz,et al.  Likelihood DNA sequencing by hybridization. , 1993, Journal of biomolecular structure & dynamics.

[14]  P. Pevzner 1-Tuple DNA sequencing: computer analysis. , 1989, Journal of biomolecular structure & dynamics.

[15]  S. Anderson,et al.  Shotgun DNA sequencing using cloned DNase I-generated fragments , 1981, Nucleic Acids Res..

[16]  Jacek Blazewicz,et al.  DNA Sequencing With Positive and Negative Errors , 1999, J. Comput. Biol..

[17]  Maitreya J. Dunham,et al.  Comparing whole genomes using DNA microarrays , 2008, Nature Reviews Genetics.

[18]  Takaho A. Endo,et al.  Probabilistic nucleotide assembling method for sequencing by hybridization , 2004, Bioinform..

[19]  Jacek Blazewicz,et al.  Complexity of DNA sequencing by hybridization , 2003, Theor. Comput. Sci..

[20]  Christian Blum,et al.  New Constructive Heuristics for DNA Sequencing by Hybridization , 2006, WABI.

[21]  Andrei Tchernykh,et al.  Sequencing by hybridization: an enhanced crossover operator for a hybrid genetic algorithm , 2007, J. Heuristics.

[22]  Xiang-Sun Zhang,et al.  Reconstruction of DNA sequencing by hybridization , 2003, Bioinform..