Applying a multiobjective metaheuristic inspired by honey bees to phylogenetic inference

The development of increasingly popular multiobjective metaheuristics has allowed bioinformaticians to deal with optimization problems in computational biology where multiple objective functions must be taken into account. One of the most relevant research topics that can benefit from these techniques is phylogenetic inference. Throughout the years, different researchers have proposed their own view about the reconstruction of ancestral evolutionary relationships among species. As a result, biologists often report different phylogenetic trees from a same dataset when considering distinct optimality principles. In this work, we detail a multiobjective swarm intelligence approach based on the novel Artificial Bee Colony algorithm for inferring phylogenies. The aim of this paper is to propose a complementary view of phylogenetics according to the maximum parsimony and maximum likelihood criteria, in order to generate a set of phylogenetic trees that represent a compromise between these principles. Experimental results on a variety of nucleotide data sets and statistical studies highlight the relevance of the proposal with regard to other multiobjective algorithms and state-of-the-art biological methods.

[1]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[2]  Carlos Cotta,et al.  Reconstructing Phylogenies with Memetic Algorithms and Branch-and-Bound , 2007, Analysis of Biological Data: A Soft Computing Approach.

[3]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[4]  Harold L. Drake,et al.  Hitherto Unknown [Fe-Fe]-Hydrogenase Gene Diversity in Anaerobes and Anoxic Enrichments from a Moderately Acidic Fen , 2010, Applied and Environmental Microbiology.

[5]  Miguel A. Vega-Rodríguez,et al.  Comparing Different Operators and Models to Improve a Multiobjective Artificial Bee Colony Algorithm for Inferring Phylogenies , 2012, TPNC.

[6]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[7]  Pablo A. Goloboff,et al.  TNT, a free program for phylogenetic analysis , 2008 .

[8]  Bing Sun,et al.  Numerical solution to the optimal feedback control of continuous casting process , 2007, J. Glob. Optim..

[9]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[10]  Lothar Thiele,et al.  Comparison of Multiobjective Evolutionary Algorithms: Empirical Results , 2000, Evolutionary Computation.

[11]  Marco Laumanns,et al.  Performance assessment of multiobjective optimizers: an analysis and review , 2003, IEEE Trans. Evol. Comput..

[12]  David Posada,et al.  Using models of nucleotide evolution to build phylogenetic trees. , 2005, Developmental and comparative immunology.

[13]  M. Servedio,et al.  Phylogenetic analysis and intraspecific variation: performance of parsimony, likelihood, and distance methods. , 1998, Systematic biology.

[14]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[15]  W. Cancino,et al.  A Multi-Criterion Evolutionary Approach Applied to Phylogenetic Reconstruction , 2010 .

[16]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[17]  Alexandre C. B. Delbem,et al.  A Multi-objective Evolutionary Approach for Phylogenetic Inference , 2006, EMO.

[18]  D. Wake,et al.  Morphological homoplasy, life history evolution, and historical biogeography of plethodontid salamanders inferred from complete mitochondrial genomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Helen Piontkivska,et al.  Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used. , 2004, Molecular phylogenetics and evolution.

[20]  Alexei N. Skourikhine Phylogenetic tree reconstruction using self-adaptive genetic algorithm , 2000, Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering.

[21]  Dervis Karaboga,et al.  A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm , 2007, J. Glob. Optim..

[22]  J. Huelsenbeck Performance of Phylogenetic Methods in Simulation , 1995 .

[23]  Robert K Jansen,et al.  Molecular evidence for the age, origin, and evolutionary history of the American desert plant genus Tiquilia (Boraginaceae). , 2006, Molecular phylogenetics and evolution.

[24]  D. Posada jModelTest: phylogenetic model averaging. , 2008, Molecular biology and evolution.

[25]  Atte Moilanen,et al.  Searching for Most Parsimonious Trees with Simulated Evolutionary Optimization , 1999 .

[26]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[27]  G. Giribet,et al.  TNT: Tree Analysis Using New Technology , 2005 .

[28]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[29]  C. Cotta,et al.  A memetic-aided approach to hierarchical clustering from distance matrices: application to gene expression clustering and phylogeny. , 2003, Bio Systems.

[30]  Joshua D. Knowles,et al.  Multiobjective Optimization in Bioinformatics and Computational Biology , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  L. Buydens,et al.  Using genetic algorithms for the construction of phylogenetic trees: application to G-protein coupled receptor sequences. , 1999, Bio Systems.

[32]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[33]  Hideo Matsuda,et al.  Construction of Phylogenetic Trees from Amino Acid Sequences using a Genetic Algorithm , 1995 .

[34]  Sylvain Gaillard,et al.  Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics , 2006, BMC Bioinformatics.

[35]  J. S. Rogers,et al.  Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. , 2001, Systematic biology.

[36]  H. Levene Robust tests for equality of variances , 1961 .

[37]  Tamir Tuller,et al.  Maximum Likelihood of Evolutionary Trees Is Hard , 2005, RECOMB.

[38]  C. Dietrich,et al.  Phylogeny of the treehoppers (Insecta: Hemiptera: Membracidae): evidence from two nuclear genes. , 2000, Molecular phylogenetics and evolution.

[39]  Clare Bates Congdon Gaphyl: An Evolutionary Algorithms Approach For The Study Of Natural Evolution , 2002, GECCO.

[40]  H. Lindman Analysis of variance in complex experimental designs , 1974 .

[41]  David S. Johnson,et al.  The computational complexity of inferring rooted phylogenies by parsimony , 1986 .

[42]  Leon Poladian,et al.  Multi-objective evolutionary algorithms and phylogenetic inference with multiple data sets , 2006, Soft Comput..

[43]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[44]  Pablo Moscato,et al.  Inferring Phylogenetic Trees Using Evolutionary Algorithms , 2002, PPSN.

[45]  James R. Cole,et al.  The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis , 2004, Nucleic Acids Res..

[46]  Mark J. Clement,et al.  Parallel Phylogenetic Inference , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[47]  B. Rannala,et al.  Molecular phylogenetics: principles and practice , 2012, Nature Reviews Genetics.

[48]  Michel C. Milinkovitch,et al.  MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics , 2010, BMC Bioinformatics.

[49]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[50]  P. Goloboff Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima , 1999, Cladistics : the international journal of the Willi Hennig Society.

[51]  J. Oliver,et al.  The general stochastic model of nucleotide substitution. , 1990, Journal of theoretical biology.

[52]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[53]  J. Macey,et al.  Plethodontid salamander mitochondrial genomics: A parsimony evaluation of character conflict and implications for historical biogeography , 2005, Cladistics : the international journal of the Willi Hennig Society.

[54]  Max Ingman,et al.  mtDB: Human Mitochondrial Genome Database, a resource for population genetics and medical sciences , 2005, Nucleic Acids Res..

[55]  A. Lemmon,et al.  The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[56]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[57]  Kazutaka Katoh,et al.  Genetic Algorithm-Based Maximum-Likelihood Analysis for Molecular Phylogeny , 2001, Journal of Molecular Evolution.

[58]  Dervis Karaboga,et al.  A comprehensive survey: artificial bee colony (ABC) algorithm and applications , 2012, Artificial Intelligence Review.

[59]  Jin-Kao Hao,et al.  Progressive Tree Neighborhood Applied to the Maximum Parsimony Problem , 2008, TCBB.

[60]  James F. Smith Phylogenetics of seed plants : An analysis of nucleotide sequences from the plastid gene rbcL , 1993 .

[61]  Miguel A. Vega-Rodríguez,et al.  Inferring Phylogenetic Trees Using a Multiobjective Artificial Bee Colony Algorithm , 2012, EvoBIO.

[62]  D. Wake,et al.  A multigenic perspective on phylogenetic relationships in the largest family of salamanders, the Plethodontidae. , 2011, Molecular phylogenetics and evolution.

[63]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[64]  Peter Widmayer,et al.  Evolutionary multiobjective optimization for base station transmitter placement with frequency assignment , 2003, IEEE Trans. Evol. Comput..

[65]  Fernando José Von Zuben,et al.  An immune-inspired multi-objective approach to the reconstruction of phylogenetic trees , 2010, Neural Computing and Applications.