The Global Landscape of SARS-CoV-2 Genomes, Variants, and Haplotypes in 2019nCoVR

On 22 January 2020, the National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), created the 2019 Novel Coronavirus Resource (2019nCoVR), an open-access SARS-CoV-2 information resource. 2019nCoVR features a comprehensive integration of sequence and clinical information for all publicly available SARS-CoV-2 isolates, which are manually curated with value-added annotations and quality evaluated by our in-house automated pipeline. Of particular note, 2019nCoVR performs systematic analyses to generate a dynamic landscape of SARS-CoV-2 genomic variations at a global scale. It provides all identified variants and detailed statistics for each virus isolate, and congregates the quality score, functional annotation, and population frequency for each variant. It also generates visualization of the spatiotemporal change for each variant and yields historical viral haplotype network maps for the course of the outbreak from all complete and high-quality genomes. Moreover, 2019nCoVR provides a full collection of SARS-CoV-2 relevant literature on COVID-19 (Coronavirus Disease 2019), including published papers from PubMed as well as preprints from services such as bioRxiv and medRxiv through Europe PMC. Furthermore, by linking with relevant databases in CNCB-NGDC, 2019nCoVR offers data submission services for raw sequence reads and assembled genomes, and data sharing with National Center for Biotechnology Information. Collectively, all SARS-CoV-2 genome sequences, variants, haplotypes and literature are updated daily to provide timely information, making 2019nCoVR a valuable resource for the global research community. 2019nCoVR is accessible at https://bigd.big.ac.cn/ncov/.

[1]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[2]  Chris Armit,et al.  Increased interactivity and improvements to the GigaScience database, GigaDB , 2018, bioRxiv.

[3]  Jun Yu,et al.  The Elements of Data Sharing , 2020, Genomics, Proteomics & Bioinformatics.

[4]  F. Cunningham,et al.  The Ensembl Variant Effect Predictor , 2016, Genome Biology.

[5]  Zhang Zhang,et al.  The 2019 novel coronavirus resource. , 2020, Yi chuan = Hereditas.

[6]  Wei Li,et al.  gcMeta: a Global Catalogue of Metagenomics platform to support the archiving, standardization and analysis of microbiome data , 2018, Nucleic Acids Res..

[7]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[8]  D. Montefiori,et al.  Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2 , 2020, bioRxiv.

[9]  Yuelong Shu,et al.  GISAID: Global initiative on sharing all influenza data – from vision to reality , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[10]  Xiang Li,et al.  On the origin and continuing evolution of SARS-CoV-2 , 2020, National science review.

[11]  Nicholas B Rego,et al.  3Dmol.js: molecular visualization with WebGL , 2014, Bioinform..

[12]  H. Bandelt,et al.  Median-joining networks for inferring intraspecific phylogenies. , 1999, Molecular biology and evolution.

[13]  E. Holmes,et al.  A new coronavirus associated with human respiratory disease in China , 2020, Nature.

[14]  O.F. Fagbule,et al.  2019 NOVEL CORONAVIRUS , 2019, Annals of Ibadan postgraduate medicine.

[15]  Zhang Zhang,et al.  Database Resources of the National Genomics Data Center in 2020 , 2019, Nucleic Acids Res..

[16]  Stefan Elbe,et al.  Data, disease and diplomacy: GISAID's innovative contribution to global health , 2017, Global challenges.

[17]  Edward C. Holmes,et al.  A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology , 2020, Nature Microbiology.

[18]  Xavier Didelot,et al.  The application of genomics to tracing bacterial pathogen transmission. , 2015, Current opinion in microbiology.

[19]  Hyeshik Chang,et al.  The Architecture of SARS-CoV-2 Transcriptome , 2020, Cell.

[20]  A. M. Leontovich,et al.  The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2 , 2020, Nature Microbiology.

[21]  Daniel J. Wilson,et al.  Diverse sources of C. difficile infection identified on whole-genome sequencing. , 2013, The New England journal of medicine.

[22]  Christina Boucher,et al.  Sampling bias and incorrect rooting make phylogenetic network tracing of SARS-COV-2 infections unreliable , 2020, Proceedings of the National Academy of Sciences.