DNA Fragment Assembly Using Multi-Objective Genetic Algorithms

DNA Fragment Assembly Problem (FAP) is concerned with the reconstruction of the target DNA, using the several hundreds (or thousands) of sequenced fragments, by identifying the right order and orientation of each fragment in the layout. Several algorithms have been proposed for solving FAP. Most of these have solely dwelt on the single objective of maximizing the sum of the overlaps between adjacent fragments in order to optimize the fragment layout. This paper aims to formulate this FAP as a bi-objective optimization problem, with the two objectives being the maximization of the overlap between the adjacent fragments and the minimization of the overlap between the distant fragments. Moreover, since there is greater desirability for having lesser number of contigs, FAP becomes a tri-objective optimization problem where the minimization of the number of contigs becomes the additional objective. These problems were solved using the multi-objective genetic algorithm NSGA-II. The experimental results show that the NSGA-II-based Bi-Objective Fragment Assembly Algorithm (BOFAA) and the Tri-Objective Fragment Assembly Algorithm (TOFAA) are able to produce better quality layouts than those generated by the GA-based Single Objective Fragment Assembly Algorithm (SOFAA). Further, the layouts produced by TOFAA are also comparatively better than those produced using BOFAA.

[1]  Libor Wagner,et al.  Solving the DNA fragment assembly problem efficiently using iterative optimization with evolved hypermutations , 2010, GECCO '10.

[2]  Enrique Alba,et al.  Cellular genetic algorithms , 2014, GECCO.

[3]  João Meidanis,et al.  An Algorithm That Builds a Set of Strings Given Its Overlap Graph , 2002, LATIN.

[4]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[5]  Haixu Tang,et al.  Fragment assembly with short reads , 2004, Bioinform..

[6]  Tejumoluwa Bamidele-Abegunde,et al.  Comparison of DNA sequence assembly algorithms using mixed data sources , 2010 .

[7]  Eugene W. Myers,et al.  Computability of Models for Sequence Assembly , 2007, WABI.

[8]  Eugene L. Lawler,et al.  Approximate string matching in sublinear expected time , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[9]  Goutam Chakraborty,et al.  Heuristically Tuned GA to Solve Genome Fragment Assembly Problem , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[10]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[11]  Enrique Alba,et al.  SAX: a new and efficient assembler for solving DNA Fragment Assembly Problem , 2012 .

[12]  Peter J. Fleming,et al.  Genetic Algorithms for Multiobjective Optimization: FormulationDiscussion and Generalization , 1993, ICGA.

[13]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[14]  R. Staden Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. , 1982, Nucleic acids research.

[15]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[16]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[17]  J. Watson,et al.  DNA: The Secret of Life , 2003 .

[18]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[19]  Nicolas Spyratos,et al.  BDD-Based Combinatorial Keyword Query Processing under a Taxonomy Model , 2012, Int. J. Organ. Collect. Intell..

[20]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[21]  H R Garner,et al.  PRIMO: A primer design program that applies base quality statistics for automated large-scale DNA sequencing. , 1997, Genomics.

[22]  João Meidanis,et al.  Introduction to computational molecular biology , 1997 .

[23]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..

[24]  David W. Coit,et al.  Multi-objective optimization using genetic algorithms: A tutorial , 2006, Reliab. Eng. Syst. Saf..

[25]  Vincenzo De Florio,et al.  Technological Innovations in Adaptive and Dependable Systems: Advancing Models and Concepts , 2012 .

[26]  Maria-Iuliana Bocicor,et al.  A Reinforcement Learning Approach for Solving the Fragment Assembly Problem , 2011, 2011 13th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing.

[27]  Bruno Apolloni,et al.  DNA Fragment Assembly Using Neural Prediction Techniques , 1999, Int. J. Neural Syst..

[28]  S. N. Sivanandam,et al.  Introduction to genetic algorithms , 2007 .

[29]  Lars Taxén Using Activity Domain Theory for Managing Complex Systems , 2009 .

[30]  Ling Tang,et al.  A Novel Time Series Forecasting Approach Considering Data Characteristics , 2014, Int. J. Knowl. Syst. Sci..

[31]  C. Burks,et al.  CHAPTER THIRTY-FOUR – Stochastic Optimization Tools for Genomic Sequence Assembly , 1994 .

[32]  Nachol Chaiyaratana,et al.  DNA fragment assembly using an ant colony system algorithm , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[33]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[34]  David W. Corne,et al.  Approximating the Nondominated Front Using the Pareto Archived Evolution Strategy , 2000, Evolutionary Computation.

[35]  Stephanie Forrest,et al.  Genetic Algorithms for DNA Sequence Assembly , 1993, ISMB.

[36]  Chris Blondia,et al.  Impact of Cross-Layer Adaptations of Mobile IP on IEEE 802.11 Networks on Video Streaming , 2010, Int. J. Adapt. Resilient Auton. Syst..

[37]  David Hernández,et al.  De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. , 2008, Genome research.

[38]  Vincent J. Magrini,et al.  Extending assembly of short DNA sequences to handle error , 2007, Bioinform..

[39]  Enrique Alba,et al.  A self-adaptive cellular memetic algorithm for the DNA fragment assembly problem , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[40]  Stephanie Forrest,et al.  Genetic algorithms, operators, and DNA fragment assembly , 1995, Machine Learning.

[41]  Carlos Eduardo Ferreira,et al.  Rearrangement of DNA fragments: a branch-and-cut algorithm , 2002, Discret. Appl. Math..

[42]  Enrique Alba,et al.  A New Local Search Algorithm for the DNA Fragment Assembly Problem , 2007, EvoCOP.

[43]  David E. Goldberg,et al.  A niched Pareto genetic algorithm for multiobjective optimization , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[44]  X. Huang,et al.  A contig assembly program based on sensitive detection of fragment overlaps. , 1992, Genomics.

[45]  Bastien Chevreux MIRA: An Automated Genome and EST Assembler , 2007 .

[46]  Mohammad Sohel Rahman,et al.  Bee algorithms for solving DNA fragment assembly problem with noisy and noiseless data , 2012, GECCO '12.

[47]  Adam M. Phillippy,et al.  Comparative genome assembly , 2004, Briefings Bioinform..

[48]  F. Crick,et al.  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid , 1953, Nature.

[49]  Enrique Alba,et al.  Seeding strategies and recombination operators for solving the DNA fragment assembly problem , 2008, Inf. Process. Lett..

[50]  R. Staden A new computer method for the storage and manipulation of DNA gel reading data. , 1980, Nucleic acids research.

[51]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[52]  L. Hillier,et al.  PCAP: a whole-genome assembly program. , 2003, Genome research.

[53]  Mehrdad Tamiz,et al.  Multi-objective meta-heuristics: An overview of the current state-of-the-art , 2002, Eur. J. Oper. Res..