Assembling DNA fragments with parallel algorithms

As more research centers embark on sequencing new genomes, the problem of DNA fragment assembly for shotgun sequencing is growing in importance and complexity. Accurate and fast assembly is a crucial part of any sequencing project and since the DNA fragment assembly problem is NP-hard, exact solutions are very difficult to obtain. Various heuristics, including genetic algorithms, were designed for solving the fragment assembly problem. While the sequential genetic algorithm has given good results, it is unable to sequence very large DNA molecules. In this work, we present two parallel methods, a distributed genetic algorithm and a parallel simulated annealing, to solve problem instances that are 77K base pairs long accurately

[1]  D. Higgins,et al.  SAGA: sequence alignment by genetic algorithm. , 1996, Nucleic acids research.

[2]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[3]  Stephanie Forrest,et al.  Genetic algorithms, operators, and DNA fragment assembly , 1995, Machine Learning.

[4]  Xin Yao,et al.  A new simulated annealing algorithm , 1995, Int. J. Comput. Math..

[5]  Enrique Alba,et al.  MALLBA: A Library of Skeletons for Combinatorial Optimisation (Research Note) , 2002, Euro-Par.

[6]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[7]  João Meidanis,et al.  Introduction to computational molecular biology , 1997 .

[8]  L. Darrell Whitley,et al.  The GENITOR Algorithm and Selection Pressure: Why Rank-Based Allocation of Reproductive Trials is Best , 1989, ICGA.

[9]  Jude Shavlik,et al.  Computational methods for fast and accurate dna fragment assembly , 1999 .

[10]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[11]  Sami Khuri,et al.  A Comparison of DNA Fragment Assembly Algorithms , 2004, METMBS.

[12]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[13]  C. Burks,et al.  Artificially generated data sets for testing DNA sequence assembly algorithms. , 1993, Genomics.

[14]  Enrique Alba,et al.  Parallelism and evolutionary algorithms , 2002, IEEE Trans. Evol. Comput..

[15]  David Martin,et al.  Computational Molecular Biology: An Algorithmic Approach , 2001 .

[16]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[17]  Steven Skiena,et al.  Trie-Based Data Structures for Sequence Assembly , 1997, CPM.

[18]  S. Kim,et al.  AMASS: A Structured Pattern Matching Approach to Shotgun Sequence Assembly , 1998, J. Comput. Biol..

[19]  Eugene W. Myers,et al.  Toward Simplifying and Accurately Formulating Fragment Assembly , 1995, J. Comput. Biol..

[20]  Mark E. Johnson,et al.  A case study in experimental design applied to genetic algorithms with applications to DNA sequence assembly , 1997 .