Coronavirus GenBrowser for monitoring the transmission and evolution of SARS-CoV-2

Abstract Genomic epidemiology is important to study the COVID-19 pandemic, and more than two million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomic sequences were deposited into public databases. However, the exponential increase of sequences invokes unprecedented bioinformatic challenges. Here, we present the Coronavirus GenBrowser (CGB) based on a highly efficient analysis framework and a node-picking rendering strategy. In total, 1,002,739 high-quality genomic sequences with the transmission-related metadata were analyzed and visualized. The size of the core data file is only 12.20 MB, highly efficient for clean data sharing. Quick visualization modules and rich interactive operations are provided to explore the annotated SARS-CoV-2 evolutionary tree. CGB binary nomenclature is proposed to name each internal lineage. The pre-analyzed data can be filtered out according to the user-defined criteria to explore the transmission of SARS-CoV-2. Different evolutionary analyses can also be easily performed, such as the detection of accelerated evolution and ongoing positive selection. Moreover, the 75 genomic spots conserved in SARS-CoV-2 but non-conserved in other coronaviruses were identified, which may indicate the functional elements specifically important for SARS-CoV-2. The CGB was written in Java and JavaScript. It not only enables users who have no programming skills to analyze millions of genomic sequences, but also offers a panoramic vision of the transmission and evolution of SARS-CoV-2.

[1]  F. Hussain,et al.  Genomic Characterization of SARS-CoV-2 , 2021, Coronavirus Disease-19 (COVID-19): A Perspective of New Scenario: Volume 1.

[2]  A. Oliver,et al.  Spread of a SARS-CoV-2 variant through Europe in the summer of 2020 , 2021, Nature.

[3]  Guoguang Zhao,et al.  A Kozak-related non-coding deletion effectively increases B.1.1.7 transmissibility , 2021 .

[4]  Wenming Zhao,et al.  Genome Warehouse: A Public Repository Housing Genome-scale Data , 2021, bioRxiv.

[5]  Zhang Zhang,et al.  Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2021 , 2020, Nucleic Acids Res..

[6]  G. Wong,et al.  Rapid Response to an Outbreak in Qingdao, China , 2020, The New England journal of medicine.

[7]  Vineet D. Menachery,et al.  Spike mutation D614G alters SARS-CoV-2 fitness , 2020, Nature.

[8]  MingKun Li,et al.  Cold-chain food contamination as the possible origin of COVID-19 resurgence in Beijing , 2020, National science review.

[9]  Zhenglin Du,et al.  An online coronavirus analysis platform from the National Genomics Data Center , 2020, Zoological research.

[10]  Douglas E. V. Pires,et al.  Exploring the structural distribution of genetic variation in SARS-CoV-2 with the COVID-3D online resource , 2020, Nature Genetics.

[11]  Daofeng Li,et al.  Exploring the coronavirus pandemic with the WashU Virus Genome Browser , 2020, Nature Genetics.

[12]  Xiang-rong Zhao,et al.  Genomic characterization of SARS-CoV-2 identified in a reemerging COVID-19 outbreak in Beijing's Xinfadi market in 2020 , 2020, Biosafety and Health.

[13]  Fei Gao,et al.  CNGBdb: China National GeneBank DataBase. , 2020, Yi chuan = Hereditas.

[14]  Xingguang Li,et al.  Phylogenetic and phylodynamic analyses of SARS-CoV-2 , 2020, Virus Research.

[15]  Tangchun Wu,et al.  Reconstruction of the full transmission dynamics of COVID-19 in Wuhan , 2020, Nature.

[16]  S. Rowland-Jones,et al.  Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus , 2020, Cell.

[17]  Trevor Bedford,et al.  Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern California , 2020, Science.

[18]  David Haussler,et al.  The UCSC SARS-CoV-2 Genome Browser , 2020, Nature Genetics.

[19]  Wen-Bin Yu,et al.  Decoding the evolution and transmissions of the novel pneumonia coronavirus (SARS-CoV-2 / HCoV-19) using whole genomic data , 2020, Zoological research.

[20]  Colin Renfrew,et al.  Phylogenetic network analysis of SARS-CoV-2 genomes , 2020, Proceedings of the National Academy of Sciences.

[21]  P. Khaitovich,et al.  Accelerated evolution of an Lhx2 enhancer shapes mammalian social hierarchies , 2020, Cell Research.

[22]  Andrew Rambaut,et al.  Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic , 2020, Nature Microbiology.

[23]  Jia-Fu Jiang,et al.  Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins , 2020, Nature.

[24]  Yonatan H. Grad,et al.  Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period , 2020, Science.

[25]  Xiang Li,et al.  On the origin and continuing evolution of SARS-CoV-2 , 2020, National science review.

[26]  Zhang Zhang,et al.  The 2019 novel coronavirus resource. , 2020, Yi chuan = Hereditas.

[27]  E. Holmes,et al.  A new coronavirus associated with human respiratory disease in China , 2020, Nature.

[28]  Kai Zhao,et al.  A pneumonia outbreak associated with a new coronavirus of probable bat origin , 2020, Nature.

[29]  E. Holmes,et al.  Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding , 2020, The Lancet.

[30]  G. Gao,et al.  A Novel Coronavirus from Patients with Pneumonia in China, 2019 , 2020, The New England journal of medicine.

[31]  Zhang Zhang,et al.  Database Resources of the National Genomics Data Center in 2020 , 2019, Nucleic Acids Res..

[32]  Yi Zheng,et al.  eGPS 1.0: comprehensive software for multi-omic and evolutionary analyses , 2019, National science review.

[33]  Sebastián Duchêne,et al.  BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis , 2018, bioRxiv.

[34]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[35]  Trevor Bedford,et al.  Nextstrain: real-time tracking of pathogen evolution , 2017, bioRxiv.

[36]  Richard A Neher,et al.  TreeTime: Maximum-likelihood phylodynamic analysis , 2017, bioRxiv.

[37]  Yuelong Shu,et al.  GISAID: Global initiative on sharing all influenza data – from vision to reality , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[38]  Stefan Elbe,et al.  Data, disease and diplomacy: GISAID's innovative contribution to global health , 2017, Global challenges.

[39]  Paul Kellam,et al.  Spread, Circulation, and Evolution of the Middle East Respiratory Syndrome Coronavirus , 2014, mBio.

[40]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[41]  H. Fineberg,et al.  Epidemic Science in Real Time , 2009, Science.

[42]  P. Woo,et al.  Phylogenetic and recombination analysis of coronavirus HKU1, a novel coronavirus from patients with pneumonia , 2005, Archives of Virology.

[43]  T. Ohta,et al.  On the constancy of the evolutionary rate of cistrons , 2005, Journal of Molecular Evolution.

[44]  Zhongming Zhao,et al.  Moderate mutation rate in the SARS coronavirus genome and its implications , 2004, BMC Evolutionary Biology.

[45]  Sung Keun Kang,et al.  Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China. , 2004, Science.

[46]  Philip M. Long,et al.  Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection , 2003, The Lancet.

[47]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[48]  J. Hartigan MINIMUM MUTATION FITS TO A GIVEN TREE , 1973 .