Improving the Efficiency of p-ECR Moves in Evolutionary TreeSearch Methods Based on Maximum Likelihood by Neighbor Joining

Inference of evolutionary trees using the maximum likelihood principle is NP-hard. Therefore, all practical methods rely on heuristics. The topological transformations often used in heuristics are nearest neighbor interchange (NNI), sub-tree prune and regraft (SPR) and tree bisection and reconnection (TBR). However, these topological transformations often fall easily into local optima, since there are not many trees accessible in one step from any given tree. Another more exhaustive topological transformation is p-Edge Contraction and Refinement (p-ECR). However, due to its high computation complexity, p-ECR has rarely been used in practice. This paper proposes a method p-ECRNJ with a O(p3) time complexity to make the p-ECR move efficient by using neighbor joining (NJ) to refine the unresolved nodes produced in p-ECR. Moreover, the demonstrated topological accuracy for small datasets of NJ can guarantee the accuracy of the p-ECRNJ move. Experiments with simulated and real datasets show that p-ECRNJ can find better trees than the best-known maximum likelihood methods so far and can efficiently improve local topological transforms in reasonable time.

[1]  Tandy J. Warnow,et al.  On contract-and-refine transformations between phylogenetic trees , 2004, SODA '04.

[2]  B. Rannala,et al.  Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference , 1996, Journal of Molecular Evolution.

[3]  Hideo Matsuda,et al.  fastDNAmL: a tool for construction of phylogenetic trees of DNA sequences using maximum likelihood , 1994, Comput. Appl. Biosci..

[4]  A. Lemmon,et al.  The metapopulation genetic algorithm: An efficient solution for the problem of large phylogeny estimation , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[5]  D. Pearl,et al.  Stochastic search strategy for estimation of maximum likelihood phylogenetic trees. , 2001, Systematic biology.

[6]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[7]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[8]  M. Rosenberg,et al.  Traditional phylogenetic reconstruction methods reconstruct shallow and deep evolutionary relationships equally well. , 2001, Molecular biology and evolution.

[9]  O. Gascuel,et al.  Improvement of distance-based phylogenetic methods by a local maximum likelihood approach using triplets. , 2002, Molecular biology and evolution.

[10]  Thomas Ludwig,et al.  A fast program for maximum likelihood-based inference of large phylogenetic trees , 2004, SAC '04.

[11]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[12]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[13]  Tandy J. Warnow,et al.  Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor-joining , 2001, SODA '01.

[14]  B. Chor,et al.  Multiple maxima of likelihood in phylogenetic trees: an analytic approach , 2000, RECOMB '00.

[15]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[16]  Hani Doss,et al.  Phylogenetic Tree Construction Using Markov Chain Monte Carlo , 2000 .

[17]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[18]  Mike Steel,et al.  The Maximum Likelihood Point for a Phylogenetic Tree is Not Unique , 1994 .

[19]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[20]  Sébastien Roch,et al.  A short proof that phylogenetic tree reconstruction by maximum likelihood is hard , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  P. Lewis,et al.  A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. , 1998, Molecular biology and evolution.

[22]  Olivier Gascuel,et al.  Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood , 2005, Bioinform..

[23]  J. S. Rogers,et al.  Multiple local maxima for likelihoods of phylogenetic trees: a simulation study. , 1999, Molecular biology and evolution.

[24]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[25]  Matthew J. Brauer,et al.  Genetic algorithms and parallel processing in maximum-likelihood phylogeny inference. , 2002, Molecular biology and evolution.

[26]  Brendan D. McKay,et al.  TrExML: a maximum-likelihood approach for extensive tree-space exploration , 2000, Bioinform..