Whole Genome Sequencing of SARS-CoV-2: Adapting Illumina Protocols for Quick and Accurate Outbreak Investigation during a Pandemic

The COVID-19 pandemic has spread very fast around the world. A few days after the first detected case in South Africa, an infection started in a large hospital outbreak in Durban, KwaZulu-Natal (KZN). Phylogenetic analysis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes can be used to trace the path of transmission within a hospital. It can also identify the source of the outbreak and provide lessons to improve infection prevention and control strategies. This manuscript outlines the obstacles encountered in order to genotype SARS-CoV-2 in near-real time during an urgent outbreak investigation. This included problems with the length of the original genotyping protocol, unavailability of reagents, and sample degradation and storage. Despite this, three different library preparation methods for Illumina sequencing were set up, and the hands-on library preparation time was decreased from twelve to three hours, which enabled the outbreak investigation to be completed in just a few weeks. Furthermore, the new protocols increased the success rate of sequencing whole viral genomes. A simple bioinformatics workflow for the assembly of high-quality genomes in near-real time was also fine-tuned. In order to allow other laboratories to learn from our experience, all of the library preparation and bioinformatics protocols are publicly available at protocols.io and distributed to other laboratories of the Network for Genomics Surveillance in South Africa (NGS-SA) consortium.

[1]  Robert M. Miura,et al.  Some mathematical questions in biology : DNA sequence analysis , 1986 .

[2]  T. Kirkwood Some mathematical questions in biology: DNA sequence analysis. Volume 17, Lectures on Mathematics in the Life Sciences, Robert M. Miura, (Ed.), American Mathematical Society, Providence, Rhode Island, 1986. No. of pages: × + 124. Price: $29 , 1989 .

[3]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[4]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[5]  Erik L. Hewlett,et al.  Whole-Genome Sequencing in Outbreak Analysis , 2015, Clinical Microbiology Reviews.

[6]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[7]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[8]  P. Lemey,et al.  Tracking virus outbreaks in the twenty-first century , 2018, Nature Microbiology.

[9]  Tulio de Oliveira,et al.  Genome Detective: an automated system for virus identification from high-throughput sequencing data , 2018, Bioinform..

[10]  H. Touzet,et al.  A complete protocol for whole-genome sequencing of virus from clinical samples: Application to coronavirus OC43 , 2019, Virology.

[11]  F. Bonfiglio,et al.  Evaluation of Rapid Library Preparation Protocols for Whole Genome Sequencing Based Outbreak Investigation , 2019, Front. Public Health.

[12]  W. Leung,et al.  A territory-wide study of early COVID-19 outbreak in Hong Kong community: A clinical, epidemiological and phylogenomic investigation , 2020, medRxiv.

[13]  Trevor Bedford,et al.  Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern California , 2020, Science.

[14]  Tulio de Oliveira,et al.  Genome Detective Coronavirus Typing Tool for rapid identification and characterization of novel coronavirus genomes , 2020, Bioinform..

[15]  Edward C. Holmes,et al.  A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology , 2020, Nature Microbiology.

[16]  Isaac I. Bogoch,et al.  Coast-to-Coast Spread of SARS-CoV-2 during the Early Epidemic in the United States , 2020, Cell.

[17]  Gintaras Deikus,et al.  Introductions and early spread of SARS-CoV-2 in the New York City area , 2020, Science.

[18]  Edward C. Holmes,et al.  A dynamic nomenclature proposal for SARS-CoV-2 to assist genomic epidemiology , 2020, bioRxiv.

[19]  Guangchuang Yu,et al.  Using ggtree to Visualize Data on Tree‐Like Structures , 2020, Current protocols in bioinformatics.

[20]  J. Quick,et al.  nCoV-2019 sequencing protocol v1 , 2020 .

[21]  J. Lourenco,et al.  Early transmission of SARS-CoV-2 in South Africa: An epidemiological and phylogenetic report , 2020, International Journal of Infectious Diseases.

[22]  E. Holmes,et al.  A new coronavirus associated with human respiratory disease in China , 2020, Nature.

[23]  Kari Stefansson,et al.  Spread of SARS-CoV-2 in the Icelandic Population , 2020, The New England journal of medicine.

[24]  E. Holmes,et al.  An emergent clade of SARS-CoV-2 linked to returned travellers from Iran , 2020, bioRxiv.

[25]  A. Zarbock,et al.  Coronaviruses and SARS-CoV-2: A Brief Overview , 2020, Anesthesia and analgesia.

[26]  Gintaras Deikus,et al.  Introductions and early spread of SARS-CoV-2 in the New York City area , 2020, Science.