Genetic algorithms and parallel processing in maximum-likelihood phylogeny inference.

We investigated the usefulness of a parallel genetic algorithm for phylogenetic inference under the maximum-likelihood (ML) optimality criterion. Parallelization was accomplished by assigning each "individual" in the genetic algorithm "population" to a separate processor so that the number of processors used was equal to the size of the evolving population (plus one additional processor for the control of operations). The genetic algorithm incorporated branch-length and topological mutation, recombination, selection on the ML score, and (in some cases) migration and recombination among subpopulations. We tested this parallel genetic algorithm with large (228 taxa) data sets of both empirically observed DNA sequence data (for angiosperms) as well as simulated DNA sequence data. For both observed and simulated data, search-time improvement was nearly linear with respect to the number of processors, so the parallelization strategy appears to be highly effective at improving computation time for large phylogenetic problems using the genetic algorithm. We also explored various ways of optimizing and tuning the parameters of the genetic algorithm. Under the conditions of our analyses, we did not find the best-known solution using the genetic algorithm approach before terminating each run. We discuss some possible limitations of the current implementation of this genetic algorithm as well as of avenues for its future improvement.

[1]  R. Punnett,et al.  The Genetical Theory of Natural Selection , 1930, Nature.

[2]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[3]  D. Penny,et al.  The Use of Tree Comparison Metrics , 1985 .

[4]  M. Miyamoto,et al.  Phylogenetic Analysis of DNA Sequences , 1991 .

[5]  D. Swofford When are phylogeny estimates from molecular and morphological data incongruent , 1991 .

[6]  M. Novacek,et al.  Extinction and phylogeny , 1992 .

[7]  J. Bull,et al.  An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis , 1993 .

[8]  H Philippe,et al.  Species sampling has a major impact on phylogenetic inference. , 1993, Molecular phylogenetics and evolution.

[9]  J. Huelsenbeck,et al.  Hobgoblin of phylogenetics? , 1994, Nature.

[10]  F. McMorris,et al.  The agreement metric for labeled binary trees. , 1994, Mathematical biosciences.

[11]  Thomas L. Sterling,et al.  BEOWULF: A Parallel Workstation for Scientific Computation , 1995, ICPP.

[12]  D. Hillis Inferring complex phytogenies , 1996, Nature.

[13]  D. Hillis Inferring complex phylogenies. , 1996, Nature.

[14]  Joan D. Ferraris,et al.  Molecular Zoology: Advances, Strategies, and Protocols , 1997 .

[15]  Andrew S. Grimshaw,et al.  The Legion vision of a worldwide virtual computer , 1997, Commun. ACM.

[16]  S. Blair Hedges,et al.  Molecular zoology: Advances, strategies, and protocols , 1997 .

[17]  M. Nei,et al.  The optimization principle in phylogenetic analysis tends to give incorrect topologies when the number of nucleotides or amino acids used is small. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[18]  D. Hillis,et al.  Taxonomic sampling, phylogenetic accuracy, and investigator bias. , 1998, Systematic biology.

[19]  H. A. Orr,et al.  THE POPULATION GENETICS OF ADAPTATION: THE DISTRIBUTION OF FACTORS FIXED DURING ADAPTIVE EVOLUTION , 1998, Evolution; international journal of organic evolution.

[20]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[21]  J. S. Rogers,et al.  A fast method for approximating maximum likelihoods of phylogenetic trees from nucleotide sequences. , 1998, Systematic biology.

[22]  B. Larget,et al.  Markov Chain Monte Carlo Algorithms for the Bayesian Analysis of Phylogenetic Trees , 2000 .

[23]  D. Soltis,et al.  The phylogeny of land plants inferred from 18S rDNA sequences: pushing the limits of rDNA signal? , 1999, Molecular biology and evolution.

[24]  D. Pearl,et al.  Stochastic search strategy for estimation of maximum likelihood phylogenetic trees. , 2001, Systematic biology.

[25]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[26]  Derrick J. Zwickl,et al.  Increased taxon sampling greatly reduces phylogenetic error. , 2002, Systematic biology.

[27]  Derrick J. Zwickl,et al.  Increased taxon sampling is advantageous for phylogenetic inference. , 2002, Systematic biology.

[28]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .