Topology identifies emerging adaptive mutations in SARS-CoV-2

The COVID-19 pandemic has lead to a worldwide effort to characterize its evolution through the mapping of mutations in the genome of the coronavirus SARS-CoV-2. As the virus spreads and evolves it acquires new mutations that could have important public health consequences, including higher transmissibility, morbidity, mortality, and immune evasion, among others. Ideally, we would like to quickly identify new mutations that could confer adaptive advantages to the evolving virus by leveraging the large number of SARS-CoV-2 genomes. One way of identifying adaptive mutations is by looking at convergent mutations, mutations in the same genomic position that occur independently. The large number of currently available genomes, more than a million at this moment, however precludes the efficient use of phylogeny-based techniques. Here, we establish a fast and scalable Topological Data Analysis approach for the early warning and surveillance of emerging adaptive mutations of the coronavirus SARS-CoV-2 in the ongoing COVID-19 pandemic. Our method relies on a novel topological tool for the analysis of viral genome datasets based on persistent homology. It systematically identifies convergent events in viral evolution merely by their topological footprint and thus overcomes limitations of current phylogenetic inference techniques. This allows for an unbiased and rapid analysis of large viral datasets. We introduce a new topological measure for convergent evolution and apply it to the complete GISAID dataset as of February 2021, comprising 303,651 high-quality SARS-CoV-2 isolates taken from patients all over the world since the beginning of the pandemic. A complete list of mutations showing topological signals of convergence is compiled. We find that topologically salient mutations on the receptor-binding domain appear in several variants of concern and are linked with an increase in infectivity and immune escape. Moreover, for many adaptive mutations the topological signal precedes an increase in prevalence. We demonstrate the capability of our method to effectively identify emerging adaptive mutations at an early stage. By localizing topological signals in the dataset, we are able to extract geo-temporal information about the early occurrence of emerging adaptive mutations. The identification of these mutations can help to develop an alert system to monitor mutations of concern and guide experimentalists to focus the study of specific circulating variants.

[1]  Harry H. Panjer,et al.  Recursive Evaluation of a Family of Compound Distributions , 1981, ASTIN Bulletin.

[2]  M. Koopmans,et al.  Transmission of SARS-CoV-2 on mink farms between humans and mink and back to humans , 2020, Science.

[3]  G. Carlsson,et al.  Topology of viral evolution , 2013, Proceedings of the National Academy of Sciences.

[4]  Conor R. Walker,et al.  Stability of SARS-CoV-2 phylogenies , 2020, PLoS genetics.

[5]  Afra Zomorodian,et al.  Computing Persistent Homology , 2004, SCG '04.

[6]  A. Oliver,et al.  Spread of a SARS-CoV-2 variant through Europe in the summer of 2020 , 2021, Nature.

[7]  S. Rowland-Jones,et al.  Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus , 2020, Cell.

[8]  Herbert Edelsbrunner,et al.  Holes and dependences in an ordered complex , 2019, Comput. Aided Geom. Des..

[9]  M. Beltramello,et al.  The circulating SARS-CoV-2 spike variant N439K maintains fitness while evading antibody-mediated immunity , 2020, bioRxiv.

[10]  Shuwen Liu,et al.  Structural and functional properties of SARS-CoV-2 spike protein: potential antivirus drug development for COVID-19 , 2020, Acta Pharmacologica Sinica.

[11]  Xuguang Li,et al.  The Impact of Mutations in SARS-CoV-2 Spike on Viral Infectivity and Antigenicity , 2020, Cell.

[12]  F. Balloux,et al.  Emergence of genomic diversity and recurrent mutations in SARS-CoV-2 , 2020, Infection, Genetics and Evolution.

[13]  C. Reusken,et al.  Possible host-adaptation of SARS-CoV-2 due to improved ACE2 receptor binding in mink , 2020, Virus evolution.

[14]  J. A. Patino-Galindo,et al.  Recombination and lineage-specific mutations linked to the emergence of SARS-CoV-2 , 2020, Genome Medicine.

[15]  Su Datt Lam,et al.  Recurrent mutations in SARS-CoV-2 genomes isolated from mink point to rapid host-adaptation , 2020 .

[16]  Pardis C Sabeti,et al.  Structural and Functional Analysis of the D614G SARS-CoV-2 Spike Protein Variant , 2020, Cell.

[17]  S. Singh,et al.  Convergent evolution of SARS-CoV-2 spike mutations, L452R, E484Q and P681R, in the second wave of COVID-19 in Maharashtra, India , 2021, bioRxiv.

[18]  R. Lanfear,et al.  Want to track pandemic variants faster? Fix the bioinformatics bottleneck , 2021, Nature.

[19]  P. Sagulenko Maximum likelihood phylodynamic analysis , 2017 .

[20]  Richard A Neher,et al.  TreeTime: Maximum-likelihood phylodynamic analysis , 2017, bioRxiv.

[21]  Lisa E. Gralinski,et al.  SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo , 2020, Science.

[22]  William S. Jewell,et al.  Further Results on Recursive Evaluation of Compound Distributions , 1981, ASTIN Bulletin.

[23]  E. Holmes,et al.  The proximal origin of SARS-CoV-2 , 2020, Nature Medicine.

[24]  O. Dym,et al.  SARS-CoV-2 RBD in vitro evolution follows contagious mutation spread, yet generates an able infection inhibitor , 2021, bioRxiv.

[25]  J. Epstein,et al.  Origin and cross-species transmission of bat coronaviruses in China , 2020, Nature Communications.

[26]  Tyler N. Starr,et al.  Comprehensive mapping of mutations in the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human plasma antibodies , 2021, Cell Host & Microbe.

[27]  Kristof Theys,et al.  SANTA-SIM: simulating viral sequence evolution dynamics under selection and recombination , 2018, bioRxiv.

[28]  Pieter Libin,et al.  SANTA-SIM: Simulating Viral Sequence Evolution Dynamics Under Selection and Recombination , 2018 .

[29]  J. Todd,et al.  Shared SARS-CoV-2 diversity suggests localised transmission of minority variants , 2020, bioRxiv.

[30]  Vineet D. Menachery,et al.  Spike mutation D614G alters SARS-CoV-2 fitness , 2020, Nature.

[31]  No evidence for increased transmissibility from recurrent mutations in SARS-CoV-2 , 2020, Nature communications.

[32]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .

[33]  Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult , 2020, Molecular biology and evolution.

[34]  Ulrich Bauer,et al.  Ripser: efficient computation of Vietoris–Rips persistence barcodes , 2019, Journal of Applied and Computational Topology.

[35]  J. Todd,et al.  Within-host genomics of SARS-CoV-2 , 2020 .

[36]  H. Jäck,et al.  SARS-CoV-2 variant B.1.617 is resistant to bamlanivimab and evades antibodies induced by infection and vaccination , 2021, bioRxiv.

[37]  Trevor Bedford,et al.  Nextstrain: real-time tracking of pathogen evolution , 2017, bioRxiv.

[38]  A. Pruijssers,et al.  The coronavirus proofreading exoribonuclease mediates extensive viral recombination , 2020, bioRxiv.

[39]  Joseph Crispell,et al.  HomoplasyFinder: a simple tool to identify homoplasies on a phylogeny , 2019, Microbial genomics.

[40]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[41]  Olga Chernomor,et al.  IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era , 2020, Molecular biology and evolution.