Assessing Uncertainty in the Rooting of the SARS-CoV-2 Phylogeny

The rooting of the SARS-CoV-2 phylogeny is important for understanding the origin and early spread of the virus. Previously published phylogenies have used different rootings that do not always provide consistent results. We investigate several different strategies for rooting the SARS-CoV-2 tree and provide measures of statistical uncertainty for all methods. We show that methods based on the molecular clock tend to place the root in the B clade, while methods based on outgroup rooting tend to place the root in the A clade. The results from the two approaches are statistically incompatible, possibly as a consequence of deviations from a molecular clock or excess back-mutations. We also show that none of the methods provide strong statistical support for the placement of the root in any particular edge of the tree. Our results suggest that inferences on the origin and early spread of SARS-CoV-2 based on rooted trees should be interpreted with caution.

[1]  A. Salas,et al.  Mapping genome variation of SARS-CoV-2 worldwide highlights the impact of COVID-19 super-spreaders , 2020, Genome research.

[2]  Benoit Morel,et al.  Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult , 2020, bioRxiv.

[3]  Xingguang Li,et al.  Phylogenetic and phylodynamic analyses of SARS-CoV-2 , 2020, Virus Research.

[4]  M. Suchard,et al.  Accommodating individual travel history, global mobility, and unsampled diversity in phylogeography: a SARS-CoV-2 case study. , 2020, bioRxiv.

[5]  J. Biegel,et al.  Comprehensive Genome Analysis of 6,000 USA SARS-CoV-2 Isolates Reveals Haplotype Signatures and Localized Transmission Patterns by State and by Country , 2020, Frontiers in Microbiology.

[6]  F. Balloux,et al.  Emergence of genomic diversity and recurrent mutations in SARS-CoV-2 , 2020, Infection, Genetics and Evolution.

[7]  David Haussler,et al.  The UCSC SARS-CoV-2 Genome Browser , 2020, Nature Genetics.

[8]  P. Lemey,et al.  Temporal signal and the phylodynamic threshold of SARS-CoV-2 , 2020, bioRxiv.

[9]  Wen-Bin Yu,et al.  Decoding the evolution and transmissions of the novel pneumonia coronavirus (SARS-CoV-2 / HCoV-19) using whole genomic data , 2020, Zoological research.

[10]  R. Nielsen,et al.  Synonymous mutations and the molecular evolution of SARS-CoV-2 origins , 2020, bioRxiv.

[11]  Zhuguo Li,et al.  Bayesian phylodynamic inference on the temporal evolution and global transmission of SARS-CoV-2 , 2020, Journal of Infection.

[12]  Edward C. Holmes,et al.  A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology , 2020, bioRxiv.

[13]  Xiaogang Cui,et al.  Bayesian phylodynamic inferences on the temporal evolution and global transmission of SARS-CoV-2 , 2020 .

[14]  Colin Renfrew,et al.  Phylogenetic network analysis of SARS-CoV-2 genomes , 2020, Proceedings of the National Academy of Sciences.

[15]  Andrew Rambaut,et al.  Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic , 2020, Nature Microbiology.

[16]  Alice C Hughes,et al.  A novel bat coronavirus reveals natural insertions at the S1/S2 cleavage site of the Spike protein and a possible recombinant origin of HCoV-19 , 2020, bioRxiv.

[17]  Xiang Li,et al.  On the origin and continuing evolution of SARS-CoV-2 , 2020, National science review.

[18]  R. Lu,et al.  Mutations, Recombination and Insertion in the Evolution of 2019-nCoV , 2020, bioRxiv.

[19]  B. Foley,et al.  Evolutionary history, potential intermediate animal host, and cross‐species analyses of SARS‐CoV‐2 , 2020, Journal of medical virology.

[20]  G. Zehender,et al.  Early phylogenetic estimate of the effective reproduction number of SARS‐CoV‐2 , 2020, Journal of medical virology.

[21]  Peter K. Sorger,et al.  Recombination and lineage-specific mutations linked to the emergence of SARS-CoV-2 , 2020, Genome Medicine.

[22]  J. A. Patino-Galindo,et al.  Recombination and lineage-specific mutations linked to the emergence of SARS-CoV-2 , 2020, Genome Medicine.

[23]  Mattia Prosperi,et al.  The global spread of 2019-nCoV: a molecular evolutionary analysis , 2020, Pathogens and global health.

[24]  Marta Giovanetti,et al.  The first two cases of 2019‐nCoV in Italy: Where they come from? , 2020, Journal of medical virology.

[25]  E. Holmes,et al.  A new coronavirus associated with human respiratory disease in China , 2020, Nature.

[26]  Kai Zhao,et al.  A pneumonia outbreak associated with a new coronavirus of probable bat origin , 2020, Nature.

[27]  Fei Chen,et al.  Origin and Evolution of the 2019 Novel Coronavirus , 2020, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[28]  Xiaomei Wang,et al.  Potential of large “first generation” human‐to‐human transmission of 2019‐nCoV , 2020, Journal of medical virology.

[29]  Qun Li,et al.  An Outbreak of NCIP (2019-nCoV) Infection in China — Wuhan, Hubei Province, 2019−2020 , 2020 .

[30]  Sebastián Duchêne,et al.  Bayesian Evaluation of Temporal Signal in Measurably Evolving Populations , 2019, bioRxiv.

[31]  Alexey M. Kozlov,et al.  RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference , 2019, Bioinform..

[32]  Emmanuel Paradis,et al.  ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R , 2018, Bioinform..

[33]  Daniel L. Ayres,et al.  Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 , 2018, Virus evolution.

[34]  Siavash Mirarab,et al.  Minimum variance rooting of phylogenetic trees and implications for species tree reconstruction , 2017, PloS one.

[35]  Andrew Rambaut,et al.  Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) , 2016, Virus evolution.

[36]  Stephanie J. Spielman,et al.  Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies , 2015, bioRxiv.

[37]  Sudhir Kumar,et al.  Rooting Phylogenetic Trees , 2014 .

[38]  R. Ellison The Global Spread of , 2012 .

[39]  Evgeny M. Zdobnov,et al.  The Newick utilities: high-throughput phylogenetic tree processing in the Unix shell , 2010, Bioinform..

[40]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[41]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[42]  Sean W Graham,et al.  Rooting phylogenetic trees with distant outgroups: a case study from the commelinoid monocots. , 2002, Molecular biology and evolution.

[43]  Jonathan P. Bollback,et al.  Inferring the root of a phylogenetic tree. , 2002, Systematic biology.

[44]  Z. Yang,et al.  Estimation of primate speciation dates using local molecular clocks. , 2000, Molecular biology and evolution.

[45]  Andrew Rambaut,et al.  Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies , 2000, Bioinform..

[46]  B. Efron,et al.  Bootstrap confidence levels for phylogenetic trees. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Michael D. Hendy,et al.  A Framework for the Quantitative Study of Evolutionary Trees , 1989 .

[48]  C. Cantor,et al.  Mapping the genome. , 1988, Basic life sciences.

[49]  Wayne P. Maddison,et al.  Outgroup Analysis and Parsimony , 1984 .

[50]  Essays in statistical science : papers in honour of P.A.P. Moran , 1983 .

[51]  J. Kingman On the genealogy of large populations , 1982, Journal of Applied Probability.

[52]  C. J-F,et al.  THE COALESCENT , 1980 .

[53]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[54]  Elizabeth A. Thompson,et al.  Human Evolutionary Trees , 1975 .

[55]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[56]  D. Kendall On the Generalized "Birth-and-Death" Process , 1948 .