Genetic Algorithm-Based Maximum-Likelihood Analysis for Molecular Phylogeny

Abstract. A heuristic approach to search for the maximum-likelihood (ML) phylogenetic tree based on a genetic algorithm (GA) has been developed. It outputs the best tree as well as multiple alternative trees that are not significantly worse than the best one on the basis of the likelihood criterion. These near-optimum trees are subjected to further statistical tests. This approach enables ones to infer phylogenetic trees of over 20 taxa taking account of the rate heterogeneity among sites on practical time scales on a PC cluster. Computer simulations were conducted to compare the efficiency of the present approach with that of several likelihood-based methods and distance-based methods, using amino acid sequence data of relatively large (5–24) taxa. The superiority of the ML method over distance-based methods increases as the condition of simulations becomes more realistic (an incorrect model is assumed or many taxa are involved). This approach was applied to the inference of the universal tree based on the concatenated amino acid sequences of vertically descendent genes that are shared among all genomes whose complete sequences have been reported. The inferred tree strongly supports that Archaea is paraphyletic and Eukarya is specifically related to Crenarchaeota. Apart from the paraphyly of Archaea and some minor disagreements, the universal tree based on these genes is largely consistent with the universal tree based on SSU rRNA.

[1]  M. Hasegawa,et al.  Origin and early evolution of eukaryotes inferred from the amino acid sequences of translation elongation factors 1alpha/Tu and 2/G. , 1996, Advances in biophysics.

[2]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[3]  Roberta Creti,et al.  The Archaea Monophyly Issue: A Phylogeny of Translational Elongation Factor G(2) Sequences Inferred from an Optimized Selection of Alignment Positions , 1999, Journal of Molecular Evolution.

[4]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[5]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[6]  Masasuke Yoshida,et al.  Evolution of the vacuolar H+-ATPase: implications for the origin of eukaryotes. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[8]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[9]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[10]  A. Halpern,et al.  Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. , 2000, Molecular biology and evolution.

[11]  K. Strimmer,et al.  Accuracy of neighbor joining for n-taxon trees , 1996 .

[12]  M. Hasegawa,et al.  Relative efficiencies of the maximum likelihood, maximum parsimony, and neighbor-joining methods for estimating protein phylogeny. , 1993, Molecular phylogenetics and evolution.

[13]  M. Gouy,et al.  Accounting for evolutionary rate variation among sequence sites consistently changes universal phylogenies deduced from rRNA and protein-coding genes. , 1999, Molecular phylogenetics and evolution.

[14]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[15]  N. Pace,et al.  Perspectives on archaeal diversity, thermophily and monophyly from environmental rRNA sequences. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[16]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[17]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[18]  Y. Kawarabayasi,et al.  Complete genome sequence of an aerobic hyper-thermophilic crenarchaeon, Aeropyrum pernix K1. , 1999, DNA research : an international journal for rapid publication of reports on genes and genomes.

[19]  M. Nei,et al.  Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum-parsimony methods when substitution rate varies with site. , 1994, Molecular biology and evolution.

[20]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[21]  K. Kuma,et al.  Evolution of gene families and relationship with organismal evolution: rapid divergence of tissue-specific genes in the early evolution of chordates. , 1996, Molecular biology and evolution.

[22]  H Matsuda,et al.  Protein phylogenetic inference using maximum likelihood with a genetic algorithm. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[23]  K. Katoh,et al.  A heuristic approach of maximum likelihood method for inferring phylogenetic tree and an application to the mammalian SOX‐3 origin of the testis‐determining gene SRY , 1999, FEBS letters.

[24]  R. Overbeek,et al.  The winds of (evolutionary) change: breathing new life into microbiology. , 1996, Journal of bacteriology.

[25]  J D Palmer,et al.  The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[26]  S. Osawa,et al.  Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[27]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[28]  J A Lake,et al.  Evidence that eukaryotes and eocyte prokaryotes are immediate relatives. , 1992, Science.

[29]  Radhey S. Gupta,et al.  Signature Sequences in Diverse Proteins Provide Evidence of a Close Evolutionary Relationship Between the Deinococcus-Thermus Group and Cyanobacteria , 1998, Journal of Molecular Evolution.

[30]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[31]  J. Adachi,et al.  MOLPHY version 2.3 : programs for molecular phylogenetics based on maximum likelihood , 1996 .

[32]  L. Jin,et al.  Limitations of the evolutionary parsimony method of phylogenetic analysis. , 1990, Molecular biology and evolution.

[33]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[34]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[35]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[36]  Simonetta Gribaldo,et al.  The Root of the Universal Tree of Life Inferred from Anciently Duplicated Genes Encoding Components of the Protein-Targeting Machinery , 1998, Journal of Molecular Evolution.

[37]  J. Huelsenbeck The robustness of two phylogenetic methods: four-taxon simulations reveal a slight superiority of maximum likelihood over neighbor joining. , 1995, Molecular biology and evolution.