An evolutionary portrait of the progenitor SARS-CoV-2 and its dominant offshoots in COVID-19 pandemic

Severe acute respiratory syndrome coronavirus 2, SARS-CoV-2, was quickly identified as the cause of COVID-19 disease soon after its earliest reports. Global sequencing of thousands of genomes has revealed many common genetic variants, which are the key to unraveling the early evolutionary history of SARS-CoV-2 and tracking its global spread over time. However, our knowledge of fundamental events in the genome evolution and spread of this coronavirus remains grossly incomplete and highly uncertain. A deep understanding of the contemporary evolution of SARS-CoV-2 is urgently needed not only for a retrospective on how, when, and why COVID-19 has emerged and spread, but also for creating remedies through efforts of science, technology, medicine, and public policy. Here, we present the heretofore cryptic mutational history, phylogeny, and dynamics of SARS-CoV-2 from an analysis of tens of thousands of high-quality genomes. The reconstructed mutational progression is highly concordant with the timing of coronavirus sampling dates. It predicts the genome sequence of the progenitor virus whose earliest offspring, without any non-synonymous mutations, were still spreading worldwide months after the report of COVID-19. Over time, mutations of the progenitor gave rise to seven dominant lineages that spread episodically over time, some of which likely arose in Europe and North America after the genesis of the ancestral lineages in China. Mutational barcoding establishes that North American coronaviruses harbor genome signatures different from coronaviruses prevalent in Europe and Asia, which have converged over time. These spatiotemporal patterns continue to evolve as the pandemic progresses and can be viewed live online.

[1]  Sergei L. Kosakovsky Pond,et al.  Detection of a SARS-CoV-2 variant of concern in South Africa , 2021, Nature.

[2]  Sergei L. Kosakovsky Pond,et al.  The emergence and ongoing convergent evolution of the N501Y lineages coincides with a major global shift in the SARS-CoV-2 selective landscape , 2021, medRxiv.

[3]  Joshua B. Singer,et al.  Genomic epidemiology reveals multiple introductions of SARS-CoV-2 from mainland Europe into Scotland , 2020, Nature Microbiology.

[4]  F. Balloux,et al.  No detectable signal for ongoing genetic recombination in SARS-CoV-2 , 2020, bioRxiv.

[5]  Melis N. Anahtar,et al.  Phylogenetic analysis of SARS-CoV-2 in Boston highlights the impact of superspreading events , 2020, Science.

[6]  M. Canuti,et al.  Evidence of SARS-CoV-2 RNA in an Oropharyngeal Swab Specimen, Milan, Italy, Early December 2019 , 2020, Emerging infectious diseases.

[7]  M. Worobey,et al.  Timing the SARS-CoV-2 Index Case in Hubei Province , 2020, bioRxiv.

[8]  J. Bloom,et al.  Emergence and spread of a SARS-CoV-2 variant through Europe in the summer of 2020 , 2020, medRxiv.

[9]  M. Suchard,et al.  Accommodating individual travel history and unsampled diversity in Bayesian phylogeographic inference of SARS-CoV-2 , 2020, Nature Communications.

[10]  A. Salas,et al.  Mapping genome variation of SARS-CoV-2 worldwide highlights the impact of COVID-19 super-spreaders , 2020, Genome research.

[11]  Benoit Morel,et al.  Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult , 2020, bioRxiv.

[12]  J. Wenzel Origins of SARS-CoV-1 and SARS-CoV-2 are often poorly explored in leading publications. , 2020, Cladistics : the international journal of the Willi Hennig Society.

[13]  L. Hurst,et al.  Evidence for Strong Mutation Bias toward, and Selection against, U Content in SARS-CoV-2: Implications for Vaccine Design , 2020, Molecular biology and evolution.

[14]  A. Komissarov,et al.  Genomic epidemiology of the early stages of the SARS-CoV-2 outbreak in Russia , 2020, medRxiv.

[15]  Edward C. Holmes,et al.  A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology , 2020, Nature Microbiology.

[16]  Pardis C Sabeti,et al.  Structural and Functional Analysis of the D614G SARS-CoV-2 Spike Protein Variant , 2020, bioRxiv.

[17]  R. Matyášek,et al.  Mutation Patterns of Human SARS-CoV-2 and Bat RaTG13 Coronavirus Genomes Are Strongly Biased Towards C>U Transitions, Indicating Rapid Evolution in Their Hosts , 2020, Genes.

[18]  Graziano Pesole,et al.  Comparative genomics provides an operational classification system and reveals early emergence and biased spatio-temporal distribution of SARS-CoV-2 , 2020, bioRxiv.

[19]  S. Rowland-Jones,et al.  Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus , 2020, Cell.

[20]  Qiang Zhou,et al.  A neutralizing human antibody binds to the N-terminal domain of the Spike protein of SARS-CoV-2 , 2020, Science.

[21]  R. Nielsen,et al.  Assessing Uncertainty in the Rooting of the SARS-CoV-2 Phylogeny , 2020, bioRxiv.

[22]  Neville E. Sanjana,et al.  The D614G mutation in SARS-CoV-2 Spike increases transduction of multiple human cell types , 2020 .

[23]  Neville E. Sanjana,et al.  The Spike D614G mutation increases SARS-CoV-2 infection of multiple human cell types , 2020, bioRxiv.

[24]  X. Xia,et al.  Coronavirus genomes carry the signatures of their habitats , 2020, bioRxiv.

[25]  Jason D. Fernandes,et al.  Stability of SARS-CoV-2 phylogenies , 2020, bioRxiv.

[26]  Trevor Bedford,et al.  Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern California , 2020, Science.

[27]  M. Hossain,et al.  Emergence of European and North American mutant variants of SARS‐CoV‐2 in South‐East Asia , 2020, Transboundary and emerging diseases.

[28]  Sergei L. Kosakovsky Pond,et al.  Natural selection in the evolution of SARS-CoV-2 in bats, not humans, created a highly capable human pathogen , 2020, bioRxiv.

[29]  Joshua B. Singer,et al.  Evidence of significant natural selection in the evolution of SARS-CoV-2 in bats, not humans , 2020, bioRxiv : the preprint server for biology.

[30]  M. Suchard,et al.  The emergence of SARS-CoV-2 in Europe and the US , 2020, bioRxiv.

[31]  L. Guddat,et al.  Structural Basis for RNA Replication by the SARS-CoV-2 Polymerase , 2020, Cell.

[32]  M. Torcia,et al.  Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2 , 2020, Science Advances.

[33]  R. Colina,et al.  Evidence of Increasing Diversification of Emerging SARS-CoV-2 Strains. , 2020, Journal of medical virology.

[34]  R. Colina,et al.  Evidence of increasing diversification of emerging Severe Acute Respiratory Syndrome Coronavirus 2 strains , 2020, Journal of Medical Virology.

[35]  Christina Boucher,et al.  Sampling bias and incorrect rooting make phylogenetic network tracing of SARS-COV-2 infections unreliable , 2020, Proceedings of the National Academy of Sciences.

[36]  F. Balloux,et al.  Emergence of genomic diversity and recurrent mutations in SARS-CoV-2 , 2020, Infection, Genetics and Evolution.

[37]  Isaac I. Bogoch,et al.  Coast-to-Coast Spread of SARS-CoV-2 during the Early Epidemic in the United States , 2020, Cell.

[38]  X. Wan,et al.  Are pangolins the intermediate host of the 2019 novel coronavirus (SARS-CoV-2)? , 2020, PLoS pathogens.

[39]  Jiajia Xie,et al.  Biochemical characterization of SARS-CoV-2 nucleocapsid protein , 2020, Biochemical and Biophysical Research Communications.

[40]  M. Hoffmann,et al.  A Multibasic Cleavage Site in the Spike Protein of SARS-CoV-2 Is Essential for Infection of Human Lung Cells , 2020, Molecular Cell.

[41]  Bethany L. Dearlove,et al.  A SARS-CoV-2 vaccine candidate would likely match all currently circulating strains , 2020, bioRxiv.

[42]  Gintaras Deikus,et al.  Introductions and early spread of SARS-CoV-2 in the New York City area , 2020, Science.

[43]  Colin Renfrew,et al.  Phylogenetic network analysis of SARS-CoV-2 genomes , 2020, Proceedings of the National Academy of Sciences.

[44]  K. Yuen,et al.  Structural and Functional Basis of SARS-CoV-2 Entry by Using Human ACE2 , 2020, Cell.

[45]  R. Matyášek,et al.  Mutation patterns of human SARS-COV-2 and bat RaTG13 coronaviruses genomes are strongly biased towards C>U indicating rapid evolution in their hosts , 2020 .

[46]  M. Gismondo,et al.  Whole genome and phylogenetic analysis of two SARS-CoV-2 strains isolated in Italy in January and February 2020: additional clues on multiple introductions and further circulation in Europe , 2020, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[47]  Andrew Rambaut,et al.  Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic , 2020, Nature Microbiology.

[48]  Jorge Fernández,et al.  Phylogenetic analysis of the first four SARS‐CoV‐2 cases in Chile , 2020, Journal of medical virology.

[49]  Elena E. Giorgi,et al.  Emergence of SARS-CoV-2 through Recombination and Strong Purifying Selection , 2020, bioRxiv.

[50]  Leiliang Zhang,et al.  Spike protein recognition of mammalian ACE2 predicts the host range and an optimized ACE2 for SARS-CoV-2 infection , 2020, Biochemical and Biophysical Research Communications.

[51]  E. Holmes,et al.  The proximal origin of SARS-CoV-2 , 2020, Nature Medicine.

[52]  Y. Wan,et al.  Evidence of the Recombinant Origin and Ongoing Mutations in Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) , 2020, bioRxiv.

[53]  Xiang Li,et al.  On the origin and continuing evolution of SARS-CoV-2 , 2020, National science review.

[54]  M. Torcia,et al.  Evidence for host-dependent RNA editing in the transcriptome of SARS-CoV-2 , 2020, bioRxiv.

[55]  Chengyu Liu,et al.  Magnaporthe oryzae fimbrin organizes actin networks in the hyphal tip during polar growth and pathogenesis , 2020, PLoS pathogens.

[56]  G. Zehender,et al.  Early phylogenetic estimate of the effective reproduction number of SARS‐CoV‐2 , 2020, Journal of medical virology.

[57]  X. Wan,et al.  Are pangolins the intermediate host of the 2019 novel coronavirus (2019-nCoV) ? , 2020, bioRxiv.

[58]  A. Chaillon,et al.  Transmission dynamics and evolutionary history of 2019‐nCoV , 2020, Journal of medical virology.

[59]  Marta Giovanetti,et al.  The first two cases of 2019‐nCoV in Italy: Where they come from? , 2020, Journal of medical virology.

[60]  E. Holmes,et al.  A new coronavirus associated with human respiratory disease in China , 2020, Nature.

[61]  Kai Zhao,et al.  A pneumonia outbreak associated with a new coronavirus of probable bat origin , 2020, Nature.

[62]  S. Iamsirithaworn,et al.  Early transmission patterns of coronavirus disease 2019 (COVID-19) in travellers from Wuhan to Thailand, January 2020 , 2020, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[63]  E. Holmes,et al.  Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding , 2020, The Lancet.

[64]  P. Carmeliet,et al.  PHD1 controls muscle mTORC1 in a hydroxylation-independent manner by stabilizing leucyl tRNA synthetase , 2020, Nature Communications.

[65]  Sudhir Kumar,et al.  Molecular Evolutionary Genetics Analysis (MEGA) for macOS. , 2020, Molecular biology and evolution.

[66]  Yanxin Li,et al.  Deficiency of TRPM2 leads to embryonic neurogenesis defects in hyperthermia , 2020, Science Advances.

[67]  Sudhir Kumar,et al.  Computational enhancement of single-cell sequences for inferring tumor evolution , 2018, bioRxiv.

[68]  Sudhir Kumar,et al.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. , 2018, Molecular biology and evolution.

[69]  Trevor Bedford,et al.  Nextstrain: real-time tracking of pathogen evolution , 2017, bioRxiv.

[70]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[71]  Yuelong Shu,et al.  GISAID: Global initiative on sharing all influenza data – from vision to reality , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[72]  Edith M. Ross,et al.  OncoNEM: inferring tumor evolution from single-cell sequencing data , 2016, Genome Biology.

[73]  N. Beerenwinkel,et al.  Tree inference for single-cell data , 2016, Genome Biology.

[74]  C. Tyler-Smith,et al.  Ancient DNA and the rewriting of human history: be sparing with Occam’s razor , 2016, Genome Biology.

[75]  Richard Simon,et al.  Using single cell sequencing data to model the evolutionary history of a tumor , 2014, BMC Bioinformatics.

[76]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[77]  Jaume Bertranpetit,et al.  Human Genetic Variation, Shared and Private , 2012, Science.

[78]  Jun Yong Choi,et al.  Detection of Minority Resistance during Early HIV-1 Infection: Natural Variation and Spurious Detection rather than Transmission and Evolution of Multiple Viral Variants , 2011, Journal of Virology.

[79]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[80]  Tal Pupko,et al.  Combining multiple data sets in a likelihood analysis: which models are the best? , 2002, Molecular biology and evolution.

[81]  M. Nei,et al.  Molecular Evolution and Phylogenetics , 2000 .

[82]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[83]  M. Kreitman,et al.  Adaptive protein evolution at the Adh locus in Drosophila , 1991, Nature.

[84]  J. Wolfowitz,et al.  On a Test Whether Two Samples are from the Same Population , 1940 .

[85]  G. Carpenter Natural Selection , 1936, Nature.

[86]  Frederico Caeiro,et al.  An R implementation of several randomness tests , 2014 .