Dealing with repetitions in sequencing by hybridization

DNA sequencing by hybridization (SBH) induces errors in the biochemical experiment. Some of them are random and disappear when the experiment is repeated. Others are systematic, involving repetitions in the probes of the target sequence. A good method for solving SBH problems must deal with both types of errors. In this work we propose a new hybrid genetic algorithm for isothermic and standard sequencing that incorporates the concept of structured combinations. The algorithm is then compared with other methods designed for handling errors that arise in standard and isothermic SBH approaches. DNA sequences used for testing are taken from GenBank. The set of instances for testing was divided into two groups. The first group consisted of sequences containing positive and negative errors in the spectrum, at a rate of up to 20%, excluding errors coming from repetitions. The second group consisted of sequences containing repeated oligonucleotides, and containing additional errors up to 5% added into the spectra. Our new method outperforms the best alternative procedures for both data sets. Moreover, the method produces solutions exhibiting extremely high degree of similarity to the target sequences in the cases without repetitions, which is an important outcome for biologists. The spectra prepared from the sequences taken from GenBank are available on our website http://bio.cs.put.poznan.pl/.

[1]  Alain Hertz,et al.  On some Properties of DNA Graphs , 1999, Discret. Appl. Math..

[2]  Jacek Blazewicz,et al.  Complexity of DNA sequencing by hybridization , 2003, Theor. Comput. Sci..

[3]  Franco P. Preparata,et al.  Sequencing by Hybridization by Cooperating Direct and Reverse Spectra , 2003, J. Comput. Biol..

[4]  C. Reeves Modern heuristic techniques for combinatorial problems , 1993 .

[5]  Eran Halperin,et al.  Handling long targets and errors in sequencing by hybridization , 2002, RECOMB '02.

[6]  P. Pevzner 1-Tuple DNA sequencing: computer analysis. , 1989, Journal of biomolecular structure & dynamics.

[7]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[8]  Ron Shamir,et al.  Large scale sequencing by hybridization , 2001, J. Comput. Biol..

[9]  E M Southern,et al.  A study of oligonucleotide reassociation using large arrays of oligonucleotides synthesised on a glass support. , 1993, Nucleic acids research.

[10]  Jacek Blazewicz,et al.  Hybrid Genetic Algorithm for DNA Sequencing with Errors , 2002, J. Heuristics.

[11]  Fred W. Glover,et al.  DNA Sequencing - Tabu and Scatter Search Combined , 2004, INFORMS J. Comput..

[12]  Janusz Kaczmarek,et al.  Sequential and parallel algorithms for DNA sequencing , 1997, Comput. Appl. Biosci..

[13]  Alain Guénoche Can we recover a sequence, just knowing all its subsequences of given length? , 1992, Comput. Appl. Biosci..

[14]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[15]  K. Khrapko,et al.  [Determination of the nucleotide sequence of DNA using hybridization with oligonucleotides. A new method]. , 1988, Doklady Akademii nauk SSSR.

[16]  Silvano Martello,et al.  Meta-Heuristics: Advances and Trends in Local Search Paradigms for Optimization , 2012 .

[17]  E. Kawashima,et al.  The use of synthetic oligonucleotides as hybridization probes. II. Hybridization of oligonucleotides of mixed sequence to rabbit beta-globin DNA. , 1981, Nucleic acids research.

[18]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Kunkel Jm,et al.  Spontaneous subclavain vein thrombosis: a successful combined approach of local thrombolytic therapy followed by first rib resection. , 1989 .

[20]  Alan M. Frieze,et al.  On the power of universal bases in sequencing by hybridization , 1999, RECOMB.

[21]  Ceyda Oguz,et al.  DNA Sequencing by Hybridization via Genetic Search , 2006, Oper. Res..

[22]  Fred W. Glover,et al.  Tabu Search for Nonlinear and Parametric Optimization (with Links to Genetic Algorithms) , 1994, Discret. Appl. Math..

[23]  Takaho A. Endo,et al.  Probabilistic nucleotide assembling method for sequencing by hybridization , 2004, Bioinform..

[24]  W. Bains,et al.  A novel method for nucleic acid sequence determination. , 1988, Journal of theoretical biology.

[25]  Jacek Blazewicz,et al.  DNA Sequencing With Positive and Negative Errors , 1999, J. Comput. Biol..

[26]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[27]  Alan M. Frieze,et al.  Optimal Reconstruction of a Sequence from its Probes , 1999, J. Comput. Biol..

[28]  Yves Crama,et al.  Local Search in Combinatorial Optimization , 2018, Artificial Neural Networks.

[29]  Jacek Blazewicz,et al.  Tabu search algorithm for DNA sequencing by hybridization with isothermic libraries , 2004, Comput. Biol. Chem..

[30]  Jacek Blazewicz,et al.  Sequencing by hybridization with isothermic oligonucleotide libraries , 2004, Discret. Appl. Math..

[31]  E. D. Hyman A new method of sequencing DNA. , 1988, Analytical biochemistry.

[32]  R. Drmanac,et al.  Sequencing of megabase plus DNA by hybridization: theory of the method. , 1989, Genomics.

[33]  Waleed A. Youssef,et al.  An Enhanced Genetic Algorithm for DNA Sequencing by Hybridization with Positive and Negative Errors , 2004, GECCO.

[34]  Xiang-Sun Zhang,et al.  Reconstruction of DNA sequencing by hybridization , 2003, Bioinform..

[35]  David B. Fogel,et al.  Reconstruction of DNA Sequence Information from a Simulated DNA Chip Using Evolutionary Programming , 1998, Evolutionary Programming.