The PHA4GE SARS-CoV-2 Contextual Data Specification for Open Genomic Epidemiology

The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatic tools and resources, and advocate for greater openness, interoperability, accessibility and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a clear and present need for a fit-for-purpose, open source SARS-CoV-2 contextual data standard. As such, we have developed an extension to the INSDC pathogen package, providing a SARS-CoV-2 contextual data specification based on harmonisable, publicly available, community standards. The specification is implementable via a collection template, as well as an array of protocols and tools to support the harmonisation and submission of sequence data and contextual information to public repositories. Well-structured, rich contextual data adds value, promotes reuse, and enables aggregation and integration of disparate data sets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19.

[1]  Guy Cochrane,et al.  The international nucleotide sequence database collaboration , 2020, Nucleic Acids Res..

[2]  William L. Hamilton,et al.  Rapid implementation of SARS-CoV-2 sequencing to investigate cases of health-care associated COVID-19: a prospective genomic surveillance study , 2020, The Lancet Infectious Diseases.

[3]  Deborah L Paul,et al.  Integrating Biodiversity Infrastructure into Pathogen Discovery and Mitigation of Emerging Infectious Diseases , 2020, Bioscience.

[4]  Robert D. Finn,et al.  COVID-19 pandemic reveals the peril of ignoring metadata standards , 2020, Scientific Data.

[5]  Samir Bhatt,et al.  Evolution and epidemic spread of SARS-CoV-2 in Brazil , 2020, Science.

[6]  S. Robson,et al.  An integrated national scale SARS-CoV-2 genomic surveillance network , 2020, The Lancet Microbe.

[7]  M. Kutnu,et al.  Comparison of SARS-CoV-2 variants with INSaFLU and galaxyproject , 2020 .

[8]  Caitlin S Pedati,et al.  COVID-19 Among Workers in Meat and Poultry Processing Facilities - 19 States, April 2020. , 2020, MMWR. Morbidity and mortality weekly report.

[9]  A. Komissarov,et al.  Quality control of low-frequency variants in SARS-CoV-2 genomes , 2020, bioRxiv.

[10]  B. Hill The COVID-19 pandemic. , 2020, British journal of nursing.

[11]  M. Thomson,et al.  Phylodynamics of SARS-CoV-2 transmission in Spain , 2020, bioRxiv.

[12]  Muh-Yong Yen,et al.  COVID-19 in long-term care facilities: An upcoming threat that cannot be ignored , 2020, Journal of Microbiology, Immunology and Infection.

[13]  Torsten Seemann,et al.  Isolation and rapid sharing of the 2019 novel coronavirus (SARS‐CoV‐2) from the first patient diagnosed with COVID‐19 in Australia , 2020, The Medical journal of Australia.

[14]  Bill Gates,et al.  Responding to Covid-19 - A Once-in-a-Century Pandemic? , 2020, The New England journal of medicine.

[15]  E. Hodcroft Preliminary case report on the SARS-CoV-2 cluster in the UK, France, and Spain , 2020, Swiss medical weekly.

[16]  E. Dong,et al.  An interactive web-based dashboard to track COVID-19 in real time , 2020, The Lancet Infectious Diseases.

[17]  S. Lewandowski,et al.  Global News , 2020 .

[18]  P. Gerner-Smidt,et al.  PulseNet and the Changing Paradigm of Laboratory-Based Surveillance for Foodborne Diseases , 2019, Public health reports.

[19]  Trevor Bedford,et al.  Virus genomes reveal factors that spread and sustained the Ebola epidemic , 2017, Nature.

[20]  Yuelong Shu,et al.  GISAID: Global initiative on sharing all influenza data – from vision to reality , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[21]  Ruth Timme,et al.  Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database , 2016, Journal of Clinical Microbiology.

[22]  Brett E. Pickett,et al.  Standardized Metadata for Human Pathogen/Vector Genomic Sequences , 2014, PloS one.

[23]  Tatiana A. Tatusova,et al.  BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata , 2011, Nucleic Acids Res..

[24]  Emily S. Charlson,et al.  Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications , 2011, Nature Biotechnology.

[25]  G. Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[26]  Chris F. Taylor,et al.  The minimum information about a genome sequence (MIGS) specification , 2008, Nature Biotechnology.

[27]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[28]  Yi Guan,et al.  Recurrent mutations associated with isolation and passage of SARS coronavirus in cells from non‐human primates , 2005, Journal of medical virology.

[29]  G. Pugliese,et al.  Severe Streptococcus pyogenes Infections, United Kingdom, 2003–2004 , 2008, Emerging infectious diseases.