ML or NJ-MCL? A comparison between two robust phylogenetic methods

Large-scale gene sequencing gives an opportunity to reconstruct the tree of life and histories of multigene species phylogenies from very large datasets. A primary need for reconstructing large-scale phylogenies is a computationally efficient and accurate method. Current efforts to achieve such a goal include NJ-MCL2 described by Tamura et al. (2004; 2007), an algorithm based on maximum likelihood (ML) and neighbor joining (NJ) algorithms. Although it has been reported that the NJ-MCL method performs better than the NJ method, studies comparing the accuracy of the ML and NJ-MCL methods are lacking. Here, accuracy of the NJ-MCL and the ML methods are examined. The concatenation approach (by progressive addition of genes) is used in a biologically realistic computer simulation to infer the accuracy of the methods. Simulation results clearly show that although NJ-MCL is computationally efficient and outperforms NJ method, the ML method is clearly much more accurate than the NJ-MCL method. The results encourage the use of the ML algorithm where datasets include up to several hundred species, but for reconstructing grand-scale phylogenies (i.e., where several thousand of taxa are included) NJ-MCL is preferred.

[1]  Sudhindra R Gadagkar,et al.  Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. , 2005, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[2]  M. Nei,et al.  MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. , 2007, Molecular biology and evolution.

[3]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[4]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[5]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[6]  J. Bull,et al.  Partitioning and combining data in phylogenetic analysis , 1993 .

[7]  M. Nei,et al.  Prospects for inferring very large phylogenies by using the neighbor-joining method. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Reed A. Cartwright,et al.  DNA assembly with gaps (Dawg): simulating sequence evolution , 2005, Bioinform..

[9]  S. O’Brien,et al.  Molecular phylogenetics and the origins of placental mammals , 2001, Nature.

[10]  Thomas Uzzell,et al.  Fitting Discrete Probability Distributions to Evolutionary Events , 1971, Science.

[11]  Terry Gaasterland,et al.  The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  L. Jin,et al.  Limitations of the evolutionary parsimony method of phylogenetic analysis. , 1990, Molecular biology and evolution.

[13]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[14]  Georg Fuellen,et al.  The effect of heterotachy in multigene analysis using the neighbor joining method. , 2009, Molecular phylogenetics and evolution.

[15]  D. Soltis,et al.  Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology , 1999, Nature.

[16]  Olivier Gascuel,et al.  PHYML Online: A Web Server for Fast Maximum Likelihood-Based Phylogenetic Inference , 2018 .

[17]  Sudhir Kumar,et al.  Efficiency of the Neighbor-Joining Method in Reconstructing Deep and Shallow Evolutionary Relationships in Large Phylogenies , 2000, Journal of Molecular Evolution.

[18]  Bernard M. E. Moret,et al.  Phylogenetic Inference , 2011, Encyclopedia of Parallel Computing.

[19]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[20]  M. Tristem Molecular Evolution — A Phylogenetic Approach. , 2000, Heredity.

[21]  Joel Dudley,et al.  MEGA: A biologist-centric software for evolutionary analysis of DNA and protein sequences , 2008, Briefings Bioinform..

[22]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[23]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[24]  S. O’Brien,et al.  Placental mammal diversification and the Cretaceous–Tertiary boundary , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Oliver Eulenstein,et al.  Obtaining maximal concatenated phylogenetic data sets from large sequence databases. , 2003, Molecular biology and evolution.

[26]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[27]  Masami Hasegawa,et al.  Rooting the eutherian tree: the power and pitfalls of phylogenomics , 2007, Genome Biology.