Efficient assembling of genome fragments using genetic algorithm enhanced by heuristic search

Shotgun sequencing is the state-of-the-art to decode genome sequence. However this technique needs a lot of fragments. Combining those fragments correctly requires enormous computational cost. In our previous work we have shown how genetic algorithm (GA) could solve this problem efficiently. In this work, we added two heuristic ideas with GA to make it more efficient. One is chromosome reduction (CRed) step which shorten the length of the chromosomes, participating in genetic search, to improve the efficiency. The other is chromosome refinement (CRef) step which is a greedy heuristics, rearranging the bits using domain knowledge, to locally improve the fitness of chromosomes. With this hybridization and simple scaffold list, we could obtain longer contigs and scaffolds using GA. We experimented using three actual genome data to test our algorithm. We succeed in restructuring contigs covering about 90% of target genome sequences, and assembling about 500~1,000 fragments into 3 ~ 11 scaffolds. All the experiments were done using common desktop machines.

[1]  F. Sanger,et al.  Nucleotide sequence of bacteriophage lambda DNA. , 1982, Journal of molecular biology.

[2]  Kumar Chellapilla,et al.  Multiple sequence alignment using evolutionary programming , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[3]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[4]  João Meidanis,et al.  Introduction to computational molecular biology , 1997 .

[5]  Ying Wang,et al.  Insights into social insects from the genome of the honeybee Apis mellifera , 2006, Nature.

[6]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[7]  Yong Wang,et al.  A Genetic Algorithm Approach to Solving DNA Fragment Assembly Problem , 2005 .

[8]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[9]  Tom V. Mathew Genetic Algorithm , 2022 .

[10]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[11]  Enrique Alba,et al.  Assembling DNA fragments with parallel algorithms , 2005, 2005 IEEE Congress on Evolutionary Computation.

[12]  P. Vaidyanathan Genomics and proteomics: a signal processor's tour , 2004, IEEE Circuits and Systems Magazine.

[13]  Yoshiaki Nagamura,et al.  The genome sequence of silkworm, Bombyx mori. , 2004, DNA research : an international journal for rapid publication of reports on genes and genomes.

[14]  Michael de la Maza,et al.  Book review: Genetic Algorithms + Data Structures = Evolution Programs by Zbigniew Michalewicz (Springer-Verlag, 1992) , 1993 .

[15]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[16]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Andrew R. Jackson,et al.  The Genome of the Sea Urchin Strongylocentrotus purpuratus , 2006, Science.

[18]  Stephanie Forrest,et al.  Genetic algorithms, operators, and DNA fragment assembly , 1995, Machine Learning.

[19]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[20]  Laura Bonetta,et al.  Genome sequencing in the fast lane , 2006, Nature Methods.

[21]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[22]  Goutam Chakraborty,et al.  Heuristically Tuned GA to Solve Genome Fragment Assembly Problem , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[23]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[24]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[25]  S. Kim,et al.  AMASS: A Structured Pattern Matching Approach to Shotgun Sequence Assembly , 1998, J. Comput. Biol..

[26]  The Principles of Shotgun Sequencing and Automated Fragment Assembly , 2003 .

[27]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[28]  B. A. Pierce,et al.  Genetics: A Conceptual Approach , 2002 .

[29]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[30]  Sun Kim A Survey of Computational Techniques for Genome Sequencing , .

[31]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[32]  Chilukuri K. Mohan,et al.  Parallel hierarchical adaptive genetic algorithm for fragment assembly , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[33]  Mihai Pop,et al.  Genome Sequence Assembly: Algorithms and Issues , 2002, Computer.

[34]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[35]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[36]  Mark E. Johnson,et al.  DNA Sequence Assembly and Genetic Algorithms - New Results and Puzzling Insights , 1995, ISMB.

[37]  M. Gribskov,et al.  The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray) , 2006, Science.

[38]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[39]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[40]  K. Isono,et al.  Genome sequencing and analysis of Aspergillus oryzae , 2005, Nature.

[41]  Asen Asenov,et al.  Self-consistent particle simulation of ion channels , 2005 .

[42]  Mihai Pop,et al.  Shotgun Sequence Assembly , 2004, Adv. Comput..