A mixture model for determining SARS-Cov-2 variant composition in pooled samples

Despite of the fast development of highly effective vaccines to control the current COVID$-$19 pandemic, the unequal distribution and availability of these vaccines worldwide and the number of people infected in the world lead to the continuous emergence of SARS-CoV-2 (Severe Acute Respiratory Syndrome coronavirus 2) variants of concern. It is likely that real-time genomic surveillance will be continuously needed as an unceasing monitoring tool, necessary to follow the spillover of the disease spread and the evolution of the virus. In this context, new genomic variants of SARS-CoV-2 that may emerge as a response to selective pressure, including variants refractory to current vaccines, makes genomic surveillance programs tools of utmost importance. Here propose a statistical model for the estimation of the relative frequencies of SARS-CoV-2 variants in pooled samples. This model is built by considering a previously defined selection of genomic polymorphisms that characterize SARS-CoV-2 variants. The methods described here support both raw sequencing reads for polymorphisms-based markers calling and predefined markers in the VCF format. Results obtained by using simulated data show that our method is quite effective in recovering the correct variant proportions. Further, results obtained by considering longitudinal data from wastewater samples of two locations in Switzerland agree well with those describing the epidemiological evolution of COVID-19 variants in clinical samples of these locations. Our results show that the described method can be a valuable tool for tracking the proportions of SARS-CoV-2 variants.

[1]  D. DeMets,et al.  Genomic surveillance to combat COVID-19: challenges and opportunities , 2021, The Lancet Microbe.

[2]  Noah Alexander,et al.  Precision Metagenomics: Rapid Metagenomic Analyses for Infectious Disease Diagnostics and Public Health Surveillance. , 2017, Journal of biomolecular techniques : JBT.

[3]  Andrew N. Holding,et al.  Web tools to fight pandemics: the COVID-19 experience , 2020, Briefings Bioinform..

[4]  E. Holmes,et al.  Bioinformatics resources for SARS-CoV-2 discovery and surveillance , 2021, Briefings Bioinform..

[5]  O. Mor,et al.  Detection of SARS-CoV-2 variants by genomic analysis of wastewater samples in Israel , 2021, Science of The Total Environment.

[6]  E. Holmes,et al.  A new coronavirus associated with human respiratory disease in China , 2020, Nature.

[7]  P. Kaleebu,et al.  Emergence and spread of a SARS-CoV-2 lineage A variant (A.23.1) with altered spike protein in Uganda , 2021, Nature Microbiology.

[8]  Russell B. Corbett-Detig,et al.  Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic , 2021, Nature Genetics.

[9]  N. Beerenwinkel,et al.  Detection and surveillance of SARS-CoV-2 genomic variants in wastewater , 2021 .

[10]  Yuelong Shu,et al.  GISAID: Global initiative on sharing all influenza data – from vision to reality , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[11]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[12]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[13]  Davey L. Jones,et al.  Making waves: Wastewater-based epidemiology for COVID-19 – approaches and challenges for surveillance and prediction , 2020, Water research.

[14]  Jennifer L. Gardy,et al.  Towards a genomics-informed, real-time, global pathogen surveillance system , 2017, Nature Reviews Genetics.

[15]  Sven Rahmann,et al.  Genome analysis , 2022 .

[16]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[17]  Edward C. Holmes,et al.  A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology , 2020, Nature Microbiology.

[18]  A. Maxmen One million coronavirus sequences: popular genome site hits mega milestone , 2021, Nature.

[19]  Stephanie L. Hyland,et al.  A global metagenomic map of urban microbiomes and antimicrobial resistance , 2021, Cell.

[20]  J. Brooks,et al.  Guidance for Implementing COVID-19 Prevention Strategies in the Context of Varying Community Transmission Levels and Vaccination Coverage , 2021, MMWR. Morbidity and mortality weekly report.

[21]  Katie Vigil,et al.  Targeted wastewater surveillance of SARS-CoV-2 on a university campus for COVID-19 outbreak detection and mitigation , 2021, Environmental Research.

[22]  Lanjuan Li,et al.  Molecular Phylogenesis and Spatiotemporal Spread of SARS-CoV-2 in Southeast Asia , 2021, Frontiers in Public Health.

[23]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[24]  F. Béen,et al.  Implementation of environmental surveillance for SARS-CoV-2 virus to support public health decisions: Opportunities and challenges , 2020, Current Opinion in Environmental Science & Health.