Large-scale sequencing of SARS-CoV-2 genomes from one region allows detailed epidemiology and enables local outbreak management

The COVID-19 pandemic has spread rapidly throughout the world. In the UK, the initial peak was in April 2020; in the county of Norfolk (UK) and surrounding areas, which has a stable, low-density population, over 3,200 cases were reported between March and August 2020. As part of the activities of the national COVID-19 Genomics Consortium (COG-UK) we undertook whole genome sequencing of the SARS-CoV-2 genomes present in positive clinical samples from the Norfolk region. These samples were collected by four major hospitals, multiple minor hospitals, care facilities and community organisations within Norfolk and surrounding areas. We combined clinical metadata with the sequencing data from regional SARS-CoV-2 genomes to understand the origins, genetic variation, transmission and expansion (spread) of the virus within the region and provide context nationally. Data were fed back into the national effort for pandemic management, whilst simultaneously being used to assist local outbreak analyses. Overall, 1,565 positive samples (172 per 100,000 population) from 1,376 cases were evaluated; for 140 cases between two and six samples were available providing longitudinal data. This represented 42.6% of all positive samples identified by hospital testing in the region and encompassed those with clinical need, and health and care workers and their families. 1,035 cases had genome sequences of sufficient quality to provide phylogenetic lineages. These genomes belonged to 26 distinct global lineages, indicating that there were multiple separate introductions into the region. Furthermore, 100 genetically-distinct UK lineages were detected demonstrating local evolution, at a rate of ~2 SNPs per month, and multiple co-occurring lineages as the pandemic progressed. Our analysis: identified a sublineage associated with 6 care facilities; found no evidence of reinfection in longitudinal samples; ruled out a nosocomial outbreak; identified 16 lineages in key workers which were not in patients indicating infection control measures were effective; found the D614G spike protein mutation which is linked to increased transmissibility dominates the samples and rapidly confirmed relatedness of cases in an outbreak at a food processing facility. The large-scale genome sequencing of SARS-CoV-2-positive samples has provided valuable additional data for public health epidemiology in the Norfolk region, and will continue to help identify and untangle hidden transmission chains as the pandemic evolves.

[1]  Darren L. Smith,et al.  Geographical and temporal distribution of SARS-CoV-2 clades in the WHO European Region, January to June 2020 , 2020, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[2]  Simon H. Tausch,et al.  The PHA4GE SARS-CoV-2 Contextual Data Specification for Open Genomic Epidemiology , 2020 .

[3]  Benoit Morel,et al.  Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult , 2020, bioRxiv.

[4]  Richard Molenkamp,et al.  Rapid SARS-CoV-2 whole-genome sequencing and analysis for informed public health decision-making in the Netherlands , 2020, Nature Medicine.

[5]  Edward C. Holmes,et al.  A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology , 2020, Nature Microbiology.

[6]  S. Rowland-Jones,et al.  Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus , 2020, Cell.

[7]  Evelien M. Adriaenssens,et al.  CoronaHiT: large scale multiplexing of SARS-CoV-2 genomes using Nanopore sequencing , 2020, bioRxiv.

[8]  Joshua B. Singer,et al.  Genomic epidemiology of SARS-CoV-2 spread in Scotland highlights the role of European travel in COVID-19 emergence , 2020, medRxiv.

[9]  Jason D. Fernandes,et al.  Stability of SARS-CoV-2 phylogenies , 2020, bioRxiv.

[10]  S. Robson,et al.  An integrated national scale SARS-CoV-2 genomic surveillance network , 2020, The Lancet Microbe.

[11]  J. Bonfield,et al.  COVID-19 ARTIC v3 Illumina library construction and sequencing protocol v3 , 2020, protocols.io.

[12]  F. Balloux,et al.  Emergence of genomic diversity and recurrent mutations in SARS-CoV-2 , 2020, Infection, Genetics and Evolution.

[13]  R. Kagan,et al.  Evaluation of Transport Media and Specimen Transport Conditions for the Detection of SARS-CoV-2 by Use of Real-Time Reverse Transcription-PCR , 2020, Journal of Clinical Microbiology.

[14]  Edward C. Holmes,et al.  A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology , 2020, bioRxiv.

[15]  M. Quail,et al.  COVID-19 ARTIC v3 Illumina library construction and sequencing protocol v2 , 2020 .

[16]  J. Quick nCoV-2019 sequencing protocol v2 (GunIt) v2 , 2020 .

[17]  Guangchuang Yu,et al.  Using ggtree to Visualize Data on Tree‐Like Structures , 2020, Current protocols in bioinformatics.

[18]  De-Min Han,et al.  Gender Differences in Patients With COVID-19: Focus on Severity and Mortality , 2020, Frontiers in Public Health.

[19]  E. Dong,et al.  An interactive web-based dashboard to track COVID-19 in real time , 2020, The Lancet Infectious Diseases.

[20]  Min Kang,et al.  SARS-CoV-2 Viral Load in Upper Respiratory Specimens of Infected Patients , 2020, The New England journal of medicine.

[21]  E. Holmes,et al.  Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding , 2020, The Lancet.

[22]  Y. Hu,et al.  Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China , 2020, The Lancet.

[23]  J. Quick,et al.  nCoV-2019 sequencing protocol v1 , 2020 .

[24]  Olga Chernomor,et al.  IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era , 2019, bioRxiv.

[25]  Karthik Gangavarapu,et al.  An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar , 2018, Genome Biology.

[26]  Richard Myers,et al.  SnapperDB: A database solution for routine sequencing analysis of bacterial isolates , 2017, bioRxiv.

[27]  Nabil-Fareed Alikhan,et al.  Comparison of classical multi-locus sequence typing software for next-generation sequencing data , 2017, Microbial genomics.

[28]  Yuelong Shu,et al.  GISAID: Global initiative on sharing all influenza data – from vision to reality , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[29]  James Hadfield,et al.  Phandango: an interactive viewer for bacterial population genomics , 2017, bioRxiv.

[30]  Stefan Elbe,et al.  Data, disease and diplomacy: GISAID's innovative contribution to global health , 2017, Global challenges.

[31]  Khalil Abudahab,et al.  Microreact: visualizing and sharing data for genomic epidemiology and phylogeography , 2016, Microbial genomics.

[32]  N. Loman,et al.  CLIMB (the Cloud Infrastructure for Microbial Bioinformatics): an online resource for the medical microbiology community , 2016, bioRxiv.

[33]  Simon R. Harris,et al.  SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments , 2016, bioRxiv.

[34]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[35]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2012, Nucleic Acids Res..

[36]  S. Bentley,et al.  Developing insights into the mechanisms of evolution of bacterial pathogens from whole-genome sequences. , 2012, Future microbiology.

[37]  G. Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[38]  Gavin J. D. Smith,et al.  Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic , 2009, Nature.

[39]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[40]  I. C. O. B. Nomenclature,et al.  IUPAC-IUB commission on biochemical nomenclature (CBN). Abbreviations and symbols for nucleic acids, polynucleotides and their constituents. , 1971, Journal of Molecular Biology.

[41]  D. A. Jackson,et al.  Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity , 2020, Cell.

[42]  Abbreviations and Symbols for Nucleic Acids, Polynucleotides and their Constituents , 2005 .

[43]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.