RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference

Motivation Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture, and medicine. Finding the optimal tree under the popular maximum like-lihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets. Results We present RAxML-NG, a from scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML- NG offers improved accuracy, flexibility, speed, scalability, and usability. It compares favorably to IQ-Tree, an increasingly popular recent tool for ML-based phylogenetic inference. Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and a the recently introduced transfer bootstrap support metric. Availability The code is available under GNU GPL at https://github.com/amkozlov/raxml-ng.RAxML-NG web service (maintained by Vital- IT) is available at https://raxml-ng.vital-it.ch/. Contact alexey.kozlov@h-its.org

[1]  Alexey M. Kozlov,et al.  EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences , 2018 .

[2]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[3]  Alexandros Stamatakis,et al.  Two C++ Libraries for Counting Trees on a Phylogenetic Terrace , 2017 .

[4]  Mike Steel,et al.  Terraces in Phylogenetic Tree Space , 2011, Science.

[5]  Alexandros Stamatakis,et al.  Two C++ libraries for counting trees on a phylogenetic terrace , 2018, Bioinform..

[6]  Alexey M. Kozlov,et al.  ParGenes: a tool for massively parallel model selection and phylogenetic tree inference on thousands of genes , 2018, bioRxiv.

[7]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[8]  Benoit Morel,et al.  GeneRax: A tool for species tree-aware maximum likelihood based gene tree inference under gene duplication, transfer, and loss , 2019, bioRxiv.

[9]  Olivier Gascuel,et al.  Empirical profile mixture models for phylogenetic reconstruction , 2008, Bioinform..

[10]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[11]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[12]  Alexandros Stamatakis,et al.  The Divisible Load Balance Problem and Its Application to Phylogenetic Inference , 2014, WABI.

[13]  Alexandros Stamatakis,et al.  Phylogenetic models of rate heterogeneity: a high performance computing perspective , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[14]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[15]  Olivier Gascuel,et al.  Renewing Felsenstein’s Phylogenetic Bootstrap in the Era of Big Data , 2018, Nature.

[16]  Alexandros Stamatakis,et al.  Novel Parallelization Schemes for Large-Scale Likelihood-based Phylogenetic Inference , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[17]  A. Stamatakis,et al.  Efficient Detection of Repeating Sites to Accelerate Phylogenetic Likelihood Calculations , 2016, bioRxiv.

[18]  Alexey M. Kozlov,et al.  ExaML version 3: a tool for phylogenomic analyses on supercomputers , 2015, Bioinform..

[19]  E. Boersma,et al.  Prevention of Catheter-Related Bacteremia with a Daily Ethanol Lock in Patients with Tunnelled Catheters: A Randomized, Placebo-Controlled Trial , 2010, PloS one.

[20]  R. Fletcher Practical Methods of Optimization , 1988 .

[21]  Antonis Rokas,et al.  Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets , 2017, bioRxiv.

[22]  Olivier Gascuel,et al.  Modeling protein evolution with several amino acid replacement matrices depending on site rates. , 2012, Molecular biology and evolution.

[23]  Alexey M. Kozlov,et al.  RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference , 2019, Bioinform..

[24]  Benoit Morel,et al.  EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences , 2018, bioRxiv.

[25]  Guy Baele,et al.  Using Non-Reversible Context-Dependent Evolutionary Models to Study Substitution Patterns in Primate Non-Coding Sequences , 2010, Journal of Molecular Evolution.

[26]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.