A Multi-Criterion Evolutionary Approach Applied to Phylogenetic Reconstruction

Phylogenetic inference is one of the central problems in computational biology. It consists in finding the best tree that explains the evolutionary history of species from a given dataset. Various phylogenetic reconstruction methods have been proposed in the literature. Most of them use one optimality criterion (or objective function) to evaluate possible solutions in order to determine the best tree. On the other hand, several researches (Huelsenbeck, 1995; Kuhner & Felsenstein, 1994; Tateno et al., 1994) have shown important differences in the results obtained by applying distinct reconstruction methods to the same input data. Rokas et al. (2003) pointed out that there are several sources of incongruity in phylogenetic analysis: the optimality criterion employed, the data sets used and the evolutionary assumptions concerning data. In other words, according to the literature, the selection of the reconstruction method has a great inuence on the results. In this context, a multi-objective approach can be a relevant contribution since it can search for phylogenies using more than one criterion and produce trees which are consistent with all employed criteria. Recently, Handl et al. (2006) discussed the current and future applications of multi-objective optimization in bioinformatics and computational biology problems. Poladian & Jermiin (2006) showed how multi-objective optimization can be used in phylogenetic inference from various conicting datasets. The authors highlighted that this approach reveals sources of such conicts and provides useful information for a robust inference. Coelho et al. (2007) propose a multi-objective Artificial Immune System (De Castro & Timmis, 2002) approach for the reconstruction of phylogenetic trees. The developed algorithm, called omniaiNet, was employed to find a set of Pareto-optimal trees that represent a trade-off between the minimum evolution (Kidd & Sgaramella, 1971) and the least-squares criteria (Cavalli-Sforza & Edwards, 1967). Compared to the tree found by Neighbor Joining (NJ) algorithm (Saitou & Nei, 1987), solutions obtained by omni-aiNet have better minimum evolution and least squares scores. In this paper, we propose a multi-objective approach for phylogenetic reconstruction using maximum parsimony (Fitch, 1972) and maximum likelihood (Felsenstein, 1981) criteria. The basis of this approach and preliminary results were presented in (Cancino & Delbem, 2007a,b). The proposed technique, called PhyloMOEA, is a multi-objective evolutionary algorithm (MOEA) based on the NSGA-II (Deb, 2001). The PhyloMOEA output is a set of

[1]  Ming-Yang Kao,et al.  Phylogeny Reconstruction , 2008, Encyclopedia of Algorithms.

[2]  Thomas Ludwig,et al.  New fast and accurate heuristics for inference of large phylogenetic trees , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[3]  Hagit Attiya,et al.  Wiley Series on Parallel and Distributed Computing , 2004, SCADA Security: Machine Learning Concepts for Intrusion Detection and Prevention.

[4]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[5]  Ziheng Yang,et al.  Computational Molecular Evolution , 2006 .

[6]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[7]  Pablo Moscato,et al.  Inferring Phylogenetic Trees Using Evolutionary Algorithms , 2002, PPSN.

[8]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[9]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[10]  James R. Cole,et al.  The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis , 2004, Nucleic Acids Res..

[11]  Joseph Felsenstein,et al.  Computational Molecular Evolution.Oxford Series in Ecology and Evolution.ByZiheng Yang. Oxford and New York: Oxford University Press. $115.00 (hardcover); $52.50 (paper). xvi + 357 p.; ill.; index. 0‐19‐856699‐9 (hc); 0‐19‐856702‐2 (pb). 2006. , 2008 .

[12]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[13]  J. Huelsenbeck Performance of Phylogenetic Methods in Simulation , 1995 .

[14]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[15]  Jonathan Timmis,et al.  Artificial Immune Systems: A New Computational Intelligence Approach , 2003 .

[16]  Max Ingman,et al.  mtDB: Human Mitochondrial Genome Database, a resource for population genetics and medical sciences , 2005, Nucleic Acids Res..

[17]  Bret Larget,et al.  Faster likelihood calculations on trees , 1998 .

[18]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[19]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[20]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[21]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[22]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[23]  Atte Moilanen,et al.  Simulated Evolutionary Optimization and Local Search: Introduction and Application to Tree Search , 2001 .

[24]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[25]  L. Jin,et al.  Limitations of the evolutionary parsimony method of phylogenetic analysis. , 1990, Molecular biology and evolution.

[26]  P. Goloboff METHODS FOR FASTER PARSIMONY ANALYSIS , 1996 .

[27]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[28]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[29]  Fredrik Ronquist Fast Fitch-Parsimony Algorithms for Large Data Sets , 1998 .

[30]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[31]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[32]  Enrique Alba,et al.  Parallel Metaheuristics: A New Class of Algorithms , 2005 .

[33]  Alexandre C. B. Delbem,et al.  Multi-Criterion Phylogenetic Inference using Evolutionary Algorithms , 2007, 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[34]  Hideo Matsuda,et al.  Construction of Phylogenetic Trees from Amino Acid Sequences using a Genetic Algorithm , 1995 .

[35]  Thomas Ludwig,et al.  Accelerating Parallel Maximum Likelihood-Based Phylogenetic Tree Calculations Using Subtree Equality Vectors , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[36]  A. Lemmon,et al.  The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Clare Bates Congdon Gaphyl: An Evolutionary Algorithms Approach For The Study Of Natural Evolution , 2002, GECCO.

[38]  Alexandros Stamatakis,et al.  Phylogenetic models of rate heterogeneity: a high performance computing perspective , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[39]  Fernando José Von Zuben,et al.  A Multiobjective Approach to Phylogenetic Trees: Selecting the Most Promising Solutions from the Pareto Front , 2007, Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007).

[40]  Leon Poladian,et al.  Multi-objective evolutionary algorithms and phylogenetic inference with multiple data sets , 2006, Soft Comput..

[41]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[42]  A. Rodrigo,et al.  Likelihood-based tests of topologies in phylogenetics. , 2000, Systematic biology.

[43]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[44]  M. Nei,et al.  Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site. , 1994, Molecular biology and evolution.

[45]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[46]  Joshua D. Knowles,et al.  Multiobjective Optimization in Bioinformatics and Computational Biology , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[47]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[48]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[49]  Rae Baxter,et al.  Acknowledgments.-The authors would like to , 1982 .

[50]  K. Kidd,et al.  Phylogenetic analysis: concepts and methods. , 1971, American journal of human genetics.

[51]  Kazutaka Katoh,et al.  Genetic Algorithm-Based Maximum-Likelihood Analysis for Molecular Phylogeny , 2001, Journal of Molecular Evolution.

[52]  David Corne,et al.  The Pareto archived evolution strategy: a new baseline algorithm for Pareto multiobjective optimisation , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[53]  P. Goloboff Analyzing Large Data Sets in Reasonable Times: Solutions for Composite Optima , 1999, Cladistics : the international journal of the Willi Hennig Society.