DNA Sequence Assembly and Genetic Algorithms - New Results and Puzzling Insights

Applying genetic algorithms to DNA sequence assembly is not a straightforward process. Significantly improved results in terms of performance, quality of results, and the scaling of applicability have been realized through non-standard and even counter-intuitive parameter settings. Specifically, the solution time for a 10kb data set was reduced by an order of magnitude, and a 20kb data set that was previously unsolved by the genetic algorithm was solved in a time that represents only a linear increase from the 10kb data set. Additionally, significant progress has been made on a 35kb data set representing real biological data. A single contig solution was found for a 752 fragment subset of the data set, and a 15 contig solution was found for the full data set. This paper discusses the new results, the modifications to the previous genetic algorithm used in this study, the experimental design process by which the new results were obtained, the questions raised by these results, and some preliminary attempts to explain these results.

[1]  R. Parsons,et al.  Genetic Algorithms , Operators , and DNAFragment AssemblyMachine Learning , 1994 .

[2]  L. Hood,et al.  An experimentally derived data set constructed for testing large-scale DNA sequence assembly algorithms. , 1993, Genomics.

[3]  H. Johnson,et al.  A comparison of 'traditional' and multimedia information systems development practices , 2003, Inf. Softw. Technol..

[4]  M. L. Stein,et al.  Stochastic Optimization and the Gambler's Ruin Problem , 1992 .

[5]  R. Staden A new computer method for the storage and manipulation of DNA gel reading data. , 1980, Nucleic acids research.

[6]  P A Pevzner Combinatorial methods for DNA mapping and sequencing. , 1995, Journal of computational biology : a journal of computational molecular cell biology.

[7]  George E. P. Box,et al.  Empirical Model‐Building and Response Surfaces , 1988 .

[8]  L. Darrell Whitley,et al.  Modeling Simple Genetic Algorithms for Permutation Problems , 1994, FOGA.

[9]  L. Darrell Whitley,et al.  A Comparison of Genetic Sequencing Operators , 1991, ICGA.

[10]  J. S. Hunter,et al.  Statistics for experimenters : an introduction to design, data analysis, and model building , 1979 .

[11]  J. R. Fresco,et al.  Nucleotide Sequence , 2020, Definitions.

[12]  F. Sanger,et al.  Nucleotide sequence of bacteriophage lambda DNA. , 1982, Journal of molecular biology.

[13]  Christian Burks,et al.  Integration of Competing Ancillary Assertions in Genome Assembly , 1994, ISMB.

[14]  X. Huang,et al.  A contig assembly program based on sensitive detection of fragment overlaps. , 1992, Genomics.

[15]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[16]  J. Kececioglu Exact and approximation algorithms for DNA sequence reconstruction , 1992 .

[17]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[18]  H Murialdo,et al.  Bacteriophage lambda DNA: the beginning of the end , 1990, Journal of bacteriology.

[19]  Sidney Addelman,et al.  trans-Dimethanolbis(1,1,1-trifluoro-5,5-dimethylhexane-2,4-dionato)zinc(II) , 2008, Acta crystallographica. Section E, Structure reports online.

[20]  P. Carlsson,et al.  Analysis of the human apolipoprotein B gene; complete structure of the B-74 region. , 1986, Gene.

[21]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[22]  F W Fitzke,et al.  A new computer method for the measurement of the thickness of basement membranes. , 1989, Computers in biology and medicine.

[23]  S Forrest,et al.  Genetic algorithms , 1996, CSUR.