Genetic algorithms, operators, and DNA fragment assembly

We study different genetic algorithm operators for one permutation problem associated with the Human Genome Project—the assembly of DNA sequence fragments from a parent clone whose sequence is unknown into a consensus sequence corresponding to the parent sequence. The sorted-order representation, which does not require specialized operators, is compared with a more traditional permutation representation, which does require specialized operators. The two representations and their associated operators are compared on problems ranging from 2K to 34K base pairs (KB). Edge-recombination crossover used in conjunction with several specialized operators is found to perform best in these experiments; these operators solved a 10KB sequence, consisting of 177 fragments, with no manual intervention. Natural building blocks in the problem are exploited at progressively higher levels through “macro-operators.” This significantly improves performance.

[1]  L. Hood,et al.  Large-scale and automated DNA sequence determination. , 1991, Science.

[2]  T. Hunkapiller,et al.  Sequence accuracy of large DNA sequencing projects. , 1992, DNA sequence : the journal of DNA sequencing and mapping.

[3]  R. Staden A new computer method for the storage and manipulation of DNA gel reading data. , 1980, Nucleic acids research.

[4]  L. Hood,et al.  An experimentally derived data set constructed for testing large-scale DNA sequence assembly algorithms. , 1993, Genomics.

[5]  Rajarshi Das,et al.  A Study of Control Parameters Affecting Online Performance of Genetic Algorithms for Function Optimization , 1989, ICGA.

[6]  S Forrest,et al.  Genetic algorithms , 1996, CSUR.

[7]  James C. Bean,et al.  Genetic Algorithms and Random Keys for Sequencing and Optimization , 1994, INFORMS J. Comput..

[8]  M. Waterman Mathematical Methods for DNA Sequences , 1989 .

[9]  Stephanie Forrest,et al.  Genetic Algorithms for DNA Sequence Assembly , 1993, ISMB.

[10]  Eugene L. Lawler,et al.  Traveling Salesman Problem , 2016 .

[11]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[12]  J. Kececioglu Exact and approximation algorithms for DNA sequence reconstruction , 1992 .

[13]  Christopher J. Howe,et al.  Nucleic acids sequencing : a practical approach , 1989 .

[14]  F. Sanger,et al.  Nucleotide sequence of bacteriophage lambda DNA. , 1982, Journal of molecular biology.

[15]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[16]  C. Burks,et al.  Artificially generated data sets for testing DNA sequence assembly algorithms. , 1993, Genomics.

[17]  J. C. Bean Genetics and random keys for sequencing amd optimization , 1993 .

[18]  A Ando,et al.  Cluster of fibronectin type III repeats found in the human major histocompatibility complex class III region shows the highest homology with the repeats in an extracellular matrix protein, tenascin. , 1992, Genomics.

[19]  Walter Cedeño,et al.  An Investigation of DNA Mapping with Genetic Algorithms Preliminary Results , 1993 .

[20]  X. Huang,et al.  A contig assembly program based on sensitive detection of fragment overlaps. , 1992, Genomics.

[21]  L. Darrell Whitley,et al.  A Comparison of Genetic Sequencing Operators , 1991, ICGA.

[22]  C. Burks,et al.  CHAPTER THIRTY-FOUR – Stochastic Optimization Tools for Genomic Sequence Assembly , 1994 .

[23]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[24]  Lawrence Davis,et al.  Applying Adaptive Algorithms to Epistatic Domains , 1985, IJCAI.

[25]  P. Carlsson,et al.  Analysis of the human apolipoprotein B gene; complete structure of the B-74 region. , 1986, Gene.

[26]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[27]  James W. Fickett,et al.  A GENETIC ALGORITHM FOR ASSEMBLING CHROMOSOME PHYSICAL MAPS , 1993 .

[28]  L. Hood,et al.  Large-scale DNA sequencing. , 1991, Current opinion in biotechnology.