A Multi-objective Evolutionary Approach for Phylogenetic Inference

The phylogeny reconstruction problem consists of determining the most accurate tree that represents evolutionary relationships among species. Different criteria have been employed to evaluate possible solutions in order to guide a search algorithm towards the best tree. However, these criteria may lead to distinct phylogenies, which are often conflicting among them. In this context, a multi-objective approach can be useful since it could produce a spectrum of equally optimal trees (Pareto front) according to all criteria. We propose a multi-objective evolutionary algorithm, named PhyloMOEA, which employs the maximum parsimony and likelihood criteria to evaluate solutions. PhyloMOEA was tested using four datasets of nucleotide sequences. This algorithm found, for all datasets, a Pareto front representing a trade-off between the criteria. Moreover, SH-test showed that most of solutions have scores similar to those obtained by phylogenetic programs using one criterion.

[1]  Ziheng Yang Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A , 2000, Journal of Molecular Evolution.

[2]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[3]  Juan Julián Merelo Guervós,et al.  Parallel Problem Solving from Nature — PPSN VII , 2002, Lecture Notes in Computer Science.

[4]  Hidetoshi Shimodaira,et al.  Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference , 1999, Molecular Biology and Evolution.

[5]  Thomas Ludwig,et al.  New fast and accurate heuristics for inference of large phylogenetic trees , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[6]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[7]  Leon Poladian,et al.  Multi-objective evolutionary algorithms and phylogenetic inference with multiple data sets , 2006, Soft Comput..

[8]  A. Rodrigo,et al.  Likelihood-based tests of topologies in phylogenetics. , 2000, Systematic biology.

[9]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[10]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[11]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[12]  M. Nei,et al.  Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site. , 1994, Molecular biology and evolution.

[13]  James R. Cole,et al.  The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis , 2004, Nucleic Acids Res..

[14]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[15]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[16]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[17]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[18]  Joshua D. Knowles,et al.  Multiobjective Optimization in Bioinformatics and Computational Biology , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Max Ingman,et al.  mtDB: Human Mitochondrial Genome Database, a resource for population genetics and medical sciences , 2005, Nucleic Acids Res..

[20]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[21]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[22]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[23]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[24]  L. Jin,et al.  Limitations of the evolutionary parsimony method of phylogenetic analysis. , 1990, Molecular biology and evolution.

[25]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[26]  Atte Moilanen,et al.  Searching for Most Parsimonious Trees with Simulated Evolutionary Optimization , 1999 .

[27]  Pablo Moscato,et al.  Inferring Phylogenetic Trees Using Evolutionary Algorithms , 2002, PPSN.

[28]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[29]  J. Huelsenbeck Performance of Phylogenetic Methods in Simulation , 1995 .

[30]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[31]  A. Lemmon,et al.  The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Kazutaka Katoh,et al.  Genetic Algorithm-Based Maximum-Likelihood Analysis for Molecular Phylogeny , 2001, Journal of Molecular Evolution.

[33]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[34]  Ming-Yang Kao,et al.  Phylogeny Reconstruction , 2008, Encyclopedia of Algorithms.