An interactive viral genome evolution network analysis system enabling rapid large-scale molecular tracing of SARS-CoV-2

Comprehensive analyses of viral genomes can provide a global picture on SARS-CoV-2 transmission and help to predict the oncoming trends of pandemic. This molecular tracing is mainly conducted through extensive phylogenetic network analyses. However, the rapid accumulation of SARS-CoV-2 genomes presents an unprecedented data size and complexity that has exceeded the capacity of existing methods in constructing evolution network through virus genotyping. Here we report a Viral genome Evolution Network Analysis System (VENAS), which uses Hamming distances adjusted by the minor allele frequency to construct viral genome evolution network. The resulting network was topologically clustered and divided using community detection algorithm, and potential evolution paths were further inferred with a network disassortativity trimming algorithm. We also employed parallel computing technology to achieve rapid processing and interactive visualization of >10,000 viral genomes, enabling accurate detection and subtyping of the viral mutations through different stages of Covid-19 pandemic. In particular, several core viral mutations can be independently identified and linked to early transmission events in Covid-19 pandemic. As a general platform for comprehensive viral genome analysis, VENAS serves as a useful computational tool in the current and future pandemics.

[1]  Andrea Lancichinetti,et al.  Community detection algorithms: a comparative analysis: invited presentation, extended abstract , 2009, VALUETOOLS.

[2]  Doolittle Wf Phylogenetic Classification and the Universal Tree , 1999 .

[3]  Colin Renfrew,et al.  Phylogenetic network analysis of SARS-CoV-2 genomes , 2020, Proceedings of the National Academy of Sciences.

[4]  Juan C. Sánchez-DelBarrio,et al.  DnaSP 6: DNA Sequence Polymorphism Analysis of Large Data Sets. , 2017, Molecular biology and evolution.

[5]  Guy Baele,et al.  Travel Surveillance and Genomics Uncover a Hidden Zika Outbreak during the Waning Epidemic , 2019, Cell.

[6]  Xiang Li,et al.  On the origin and continuing evolution of SARS-CoV-2 , 2020, National science review.

[7]  Victor M Corman,et al.  Investigation of a COVID-19 outbreak in Germany resulting from a single travel-associated primary case: a case series , 2020, The Lancet Infectious Diseases.

[8]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[9]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[10]  Narayanaswamy Srinivasan,et al.  Mutations in SARS-CoV-2 viral RNA identified in Eastern India: Possible implications for the ongoing outbreak in India and impact on viral structure and host susceptibility , 2020, Journal of Biosciences.

[11]  Trevor Bedford,et al.  Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern California , 2020, Science.

[12]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[13]  Wen-Bin Yu,et al.  Decoding the evolution and transmissions of the novel pneumonia coronavirus (SARS-CoV-2 / HCoV-19) using whole genomic data , 2020, Zoological research.

[14]  O. Pybus,et al.  Unifying the Epidemiological and Evolutionary Dynamics of Pathogens , 2004, Science.

[15]  M. Nei,et al.  Prospects for inferring very large phylogenies by using the neighbor-joining method. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  M. Newman Analysis of weighted networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Sung Keun Kang,et al.  Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China. , 2004, Science.

[18]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[19]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Hongli Du,et al.  Comprehensive evolution and molecular characteristics of a large number of SARS-CoV-2 genomes reveal its epidemic trends , 2020, International Journal of Infectious Diseases.

[21]  Neta S. Zuckerman,et al.  Comprehensive Analyses of SARS-CoV-2 Transmission in a Public Health Virology Laboratory , 2020, Viruses.

[22]  S. Pillai,et al.  Inferring HIV Transmission Dynamics from Phylogenetic Sequence Relationships , 2008, PLoS medicine.

[23]  M. A. Suchard,et al.  Metagenomic sequencing at the epicenter of the Nigeria 2018 Lassa fever outbreak , 2019, Science.

[24]  William P. Hanage,et al.  Making Sense of Mutation: What D614G Means for the COVID-19 Pandemic Remains Unclear , 2020, Cell.

[25]  Emmanuel Paradis,et al.  pegas: an R package for population genetics with an integrated-modular approach , 2010, Bioinform..

[26]  Guoping Zhao,et al.  Molecular Evolution of the SARS Coronavirus During the Course of the SARS Epidemic in China , 2004, Science.

[27]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[28]  Takuri Takahashi,et al.  Haplotype networks of SARS-CoV-2 infections in the Diamond Princess cruise ship outbreak , 2020, Proceedings of the National Academy of Sciences.

[29]  L. Orgel,et al.  Phylogenetic Classification and the Universal Tree , 1999 .

[30]  David Bryant,et al.  popart: full‐feature software for haplotype network construction , 2015 .

[31]  Dilek Turgut-Balik,et al.  An updated analysis of variations in SARS-CoV-2 genome , 2020, Turkish journal of biology = Turk biyoloji dergisi.