CoVigator—A Knowledge Base for Navigating SARS-CoV-2 Genomic Variants

Background: The outbreak of the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) resulted in the global COVID-19 pandemic. The urgency for an effective SARS-CoV-2 vaccine has led to the development of the first series of vaccines at unprecedented speed. The discovery of SARS-CoV-2 spike-glycoprotein mutants, however, and consequentially the potential to escape vaccine-induced protection and increased infectivity, demonstrates the persisting importance of monitoring SARS-CoV-2 mutations to enable early detection and tracking of genomic variants of concern. Results: We developed the CoVigator tool with three components: (1) a knowledge base that collects new SARS-CoV-2 genomic data, processes it and stores its results; (2) a comprehensive variant calling pipeline; (3) an interactive dashboard highlighting the most relevant findings. The knowledge base routinely downloads and processes virus genome assemblies or raw sequencing data from the COVID-19 Data Portal (C19DP) and the European Nucleotide Archive (ENA), respectively. The results of variant calling are visualized through the dashboard in the form of tables and customizable graphs, making it a versatile tool for tracking SARS-CoV-2 variants. We put a special emphasis on the identification of intrahost mutations and make available to the community what is, to the best of our knowledge, the largest dataset on SARS-CoV-2 intrahost mutations. In the spirit of open data, all CoVigator results are available for download. The CoVigator dashboard is accessible via covigator.tron-mainz.de. Conclusions: With increasing demand worldwide in genome surveillance for tracking the spread of SARS-CoV-2, CoVigator will be a valuable resource of an up-to-date list of mutations, which can be incorporated into global efforts.

[1]  I. Rogozin,et al.  Deletions across the SARS-CoV-2 Genome: Molecular Mechanisms and Putative Functional Consequences of Deletions in Accessory Genes , 2023, Microorganisms.

[2]  Sven H. Giese,et al.  CovRadar: continuously tracking and filtering SARS-CoV-2 mutations for genomic surveillance , 2022, Bioinform..

[3]  John Huddleston,et al.  Rapid and parallel adaptive mutations in spike S1 drive clade success in SARS-CoV-2 , 2021, bioRxiv.

[4]  Sufyan bin Uzayr GitHub , 2022, Mastering Git.

[5]  Sophie Alain,et al.  ASPICov: An automated pipeline for identification of SARS-Cov2 nucleotidic variants , 2022, PloS one.

[6]  G. Wallau,et al.  ViralFlow: A Versatile Automated Workflow for SARS-CoV-2 Genome Assembly, Lineage Assignment, Mutations and Intrahost Variant Detection , 2022, Viruses.

[7]  S. Maurer-Stroh,et al.  GISAID’s Role in Pandemic Response , 2021, China CDC weekly.

[8]  Simon A. Bray,et al.  Ready-to-use public infrastructure for global SARS-CoV-2 monitoring , 2021, Nature Biotechnology.

[9]  John Huddleston,et al.  Rapid and parallel adaptive mutations in spike S1 drive clade success in SARS-CoV-2 , 2021, bioRxiv.

[10]  T. Sironen,et al.  HAVoC, a bioinformatic pipeline for reference-based consensus assembly and lineage assignment for SARS-CoV-2 sequences , 2021, BMC Bioinformatics.

[11]  O. Pybus,et al.  Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool , 2021, Virus evolution.

[12]  N. Loman,et al.  CLIMB-COVID: continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance , 2021, Genome biology.

[13]  T. Stadler,et al.  CoV-Spectrum: analysis of globally shared SARS-CoV-2 data to identify and characterize new variants , 2021, Bioinform..

[14]  Carla A. Cummins,et al.  The COVID-19 Data Portal: accelerating SARS-CoV-2 and COVID-19 research through rapid open access data sharing , 2021, Nucleic Acids Res..

[15]  E. Koonin,et al.  Insertions in SARS-CoV-2 genome caused by template switch and duplications give rise to new variants that merit monitoring , 2021, bioRxiv.

[16]  J. Todd,et al.  SARS-CoV-2 within-host diversity and transmission , 2021, Science.

[17]  Huw A. Ogilvie,et al.  SARS-CoV-2 genomic diversity and the implications for qRT-PCR diagnostics and transmission , 2021, Genome research.

[18]  T. Sironen,et al.  HAVoC, a bioinformatic pipeline for reference-based consensus assembly and lineage assignment for SARS-CoV-2 sequences , 2021, BMC Bioinformatics.

[19]  U. Şahin,et al.  Large-scale analysis of SARS-CoV-2 spike-glycoprotein mutants demonstrates the need for continuous screening of virus isolates , 2021, bioRxiv.

[20]  A. Lauring,et al.  Temporal dynamics of SARS-CoV-2 mutation accumulation within and across infected hosts , 2021, bioRxiv.

[21]  N. Berthet,et al.  Intra-Host Diversity of SARS-Cov-2 Should Not Be Neglected: Case of the State of Victoria, Australia , 2021, Viruses.

[22]  William L. Hamilton,et al.  Patterns of within-host genetic diversity in SARS-CoV-2 , 2020, bioRxiv.

[23]  John D. Davis,et al.  REGN-COV2, a Neutralizing Antibody Cocktail, in Outpatients with Covid-19 , 2020, The New England journal of medicine.

[24]  Michael T. Wolfinger,et al.  Genomic epidemiology of superspreading events in Austria reveals mutational dynamics and transmission properties of SARS-CoV-2 , 2020, Science Translational Medicine.

[25]  T. de Oliveira,et al.  High Resolution analysis of Transmission Dynamics of Sars-Cov-2 in Two Major Hospital Outbreaks in South Africa Leveraging Intrahost Diversity , 2020, medRxiv.

[26]  D. Skovronsky,et al.  SARS-CoV-2 Neutralizing Antibody LY-CoV555 in Outpatients with Covid-19 , 2020, The New England journal of medicine.

[27]  H. Yassine,et al.  Within-Host Diversity of SARS-CoV-2 in COVID-19 Patients With Variable Disease Severities , 2020, Frontiers in Cellular and Infection Microbiology.

[28]  M. Soares,et al.  SARS-CoV-2 genomic and quasispecies analyses in cancer patients reveal relaxed intrahost virus evolution , 2020, bioRxiv.

[29]  Yīmíng Bào,et al.  Worldwide tracing of mutations and the evolutionary dynamics of SARS-CoV-2 , 2020 .

[30]  Soo Bin Kwon,et al.  Single-nucleotide conservation state annotation of the SARS-CoV-2 genome , 2020, Communications Biology.

[31]  David Robertson,et al.  CoV-GLUE: A Web Application for Tracking SARS-CoV-2 Genomic Variation , 2020 .

[32]  Daniele Mercatelli,et al.  Coronapp: A web application to annotate and monitor SARS‐CoV‐2 mutations , 2020, bioRxiv.

[33]  Junhua Li,et al.  Intra-host variation and evolutionary dynamics of SARS-CoV-2 populations in COVID-19 patients , 2020, Genome Medicine.

[34]  Jian Peng,et al.  Characterization of SARS-CoV-2 viral diversity within and across hosts , 2020, bioRxiv.

[35]  J. Garcia-Diaz,et al.  Intra-host site-specific polymorphisms of SARS-CoV-2 is consistent across multiple samples and methodologies , 2020, medRxiv.

[36]  Trent M. Prall,et al.  Limited SARS-CoV-2 diversity within hosts and following passage in cell culture , 2020, bioRxiv.

[37]  A. Mentis,et al.  SARS-CoV-2 exhibits intra-host genomic plasticity and low-frequency polymorphic quasispecies , 2020, bioRxiv.

[38]  Shibo Jiang,et al.  Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: implication for development of RBD protein as a viral attachment inhibitor and vaccine , 2020, Cellular & Molecular Immunology.

[39]  Yifan Rao,et al.  The outbreak of SARS-CoV-2 pneumonia calls for viral vaccines , 2020, npj Vaccines.

[40]  MingKun Li,et al.  Genomic diversity of SARS-CoV-2 in Coronavirus Disease 2019 patients , 2020, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[41]  E. Holmes,et al.  A new coronavirus associated with human respiratory disease in China , 2020, Nature.

[42]  G. Gao,et al.  A Novel Coronavirus from Patients with Pneumonia in China, 2019 , 2020, The New England journal of medicine.

[43]  Krishna T. Malladi,et al.  CoNDA , 2019, Proceedings of the 46th International Symposium on Computer Architecture.

[44]  Srinivas Aluru,et al.  Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[45]  Simon C. Potter,et al.  The EMBL-EBI search and sequence analysis tools APIs in 2019 , 2019, Nucleic Acids Res..

[46]  J. Flowers,et al.  Origins and geographic diversification of African rice (Oryza glaberrima) , 2018, bioRxiv.

[47]  Karthik Gangavarapu,et al.  An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar , 2018, Genome Biology.

[48]  Jia Gu,et al.  fastp: an ultra-fast all-in-one FASTQ preprocessor , 2018, bioRxiv.

[49]  Trevor Bedford,et al.  Nextstrain: real-time tracking of pathogen evolution , 2017, bioRxiv.

[50]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[51]  Yuelong Shu,et al.  GISAID: Global initiative on sharing all influenza data – from vision to reality , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[52]  Stefan Elbe,et al.  Data, disease and diplomacy: GISAID's innovative contribution to global health , 2017, Global challenges.

[53]  Petr Danecek,et al.  BCFtools/csq: haplotype-aware variant consequences , 2016, bioRxiv.

[54]  M. Maia,et al.  Zika virus in Brazil and macular atrophy in a child with microcephaly , 2016, The Lancet.

[55]  M. Kieny,et al.  Rationale for WHO's New Position Calling for Prompt Reporting and Public Disclosure of Interventional Clinical Trial Results , 2015, PLoS medicine.

[56]  Edwin Cuppen,et al.  Sambamba: fast processing of NGS alignment formats , 2015, Bioinform..

[57]  Stephanie J. Spielman,et al.  The relationship between dN/dS and scaled selection coefficients. , 2015, Molecular biology and evolution.

[58]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[59]  Mark Gerstein,et al.  The origin, evolution, and functional impact of short insertion–deletion variants identified in 179 human genomes , 2013, Genome research.

[60]  A. Wilm,et al.  LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets , 2012, Nucleic acids research.

[61]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[62]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[63]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[64]  J. Plotkin,et al.  The Population Genetics of dN/dS , 2008, PLoS genetics.

[65]  Christian Drosten,et al.  Identification of a novel coronavirus in patients with severe acute respiratory syndrome. , 2003, The New England journal of medicine.

[66]  Simon Foster,et al.  Optics , 1981, Arch. Formal Proofs.

[67]  J. Luban SARS-CoV-2 , 2020 .