Comprehensive evolution and molecular characteristics of a large number of SARS-CoV-2 genomes revealed its epidemic trend and possible origins

Objectives To reveal epidemic trend and possible origins of SARS-CoV-2 by exploring its evolution and molecular characteristics based on a large number of genomes since it has infected millions of people and spread quickly all over the world. Methods Various evolution analysis methods were employed. Results The estimated Ka/Ks ratio of SARS-CoV-2 is 1.008 or 1.094 based on 622 or 3624 SARS-CoV-2 genomes, and the time to the most recent common ancestor (tMRCA) was inferred in late September 2019. Further 9 key specific sites of highly linkage and four major haplotypes H1, H2, H3 and H4 were found. The Ka/Ks, detected population size and development trends of each major haplotype showed H3 and H4 subgroups were going through a purify evolution and almost disappeared after detection, indicating H3 and H4 might have existed for a long time, while H1 and H2 subgroups were going through a near neutral or neutral evolution and globally increased with time. Notably the frequency of H1 was generally high in Europe and correlated to death rate (r>0.37). Conclusions In this study, the evolution and molecular characteristics of more than 16000 genomic sequences provided a new perspective for revealing epidemiology of SARS-CoV-2.

[1]  Jia-Fu Jiang,et al.  Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins , 2020, Nature.

[2]  Fei Chen,et al.  Origin and Evolution of the 2019 Novel Coronavirus , 2020, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[3]  David Bryant,et al.  popart: full‐feature software for haplotype network construction , 2015 .

[4]  Sebastián Duchêne,et al.  BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis , 2019, PLoS computational biology.

[5]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[6]  Wen-Bin Yu,et al.  Decoding the evolution and transmissions of the novel pneumonia coronavirus (SARS-CoV-2 / HCoV-19) using whole genomic data , 2020, Zoological research.

[7]  B. Graham,et al.  Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation , 2020, Science.

[8]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[9]  M. Provinciali,et al.  Exploring the Relevance of Senotherapeutics for the Current SARS-CoV-2 Emergency and Similar Future Global Health Threats , 2020, Cells.

[10]  Yan Zhao,et al.  Neutrophil-to-lymphocyte ratio as an independent risk factor for mortality in hospitalized patients with COVID-19 , 2020, Journal of Infection.

[11]  Guoping Zhao,et al.  Molecular Evolution of the SARS Coronavirus During the Course of the SARS Epidemic in China , 2004, Science.

[12]  E. Holmes,et al.  The proximal origin of SARS-CoV-2 , 2020, Nature Medicine.

[13]  M. Holder,et al.  Phylogeny estimation: traditional and Bayesian approaches , 2003, Nature Reviews Genetics.

[14]  J. Rocklöv,et al.  The reproductive number of COVID-19 is higher compared to SARS coronavirus , 2020, Journal of travel medicine.

[15]  Fang Li,et al.  Structure, Function, and Evolution of Coronavirus Spike Proteins. , 2016, Annual review of virology.

[16]  Zhongming Zhao,et al.  Moderate mutation rate in the SARS coronavirus genome and its implications , 2004, BMC Evolutionary Biology.

[17]  Chikin Chan,et al.  SARS-CoV-2 and COVID-19: The most important research questions , 2020, Cell & Bioscience.

[18]  Sudhir Kumar,et al.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. , 2018, Molecular biology and evolution.

[19]  E. Holmes,et al.  A Genomic Perspective on the Origin and Emergence of SARS-CoV-2 , 2020, Cell.

[20]  Amy C. Sims,et al.  SARS coronavirus replicase proteins in pathogenesis , 2007, Virus Research.

[21]  Jun Li,et al.  KaKs_Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging , 2007, Genom. Proteom. Bioinform..

[22]  K. To,et al.  Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan , 2020, Emerging microbes & infections.

[23]  Kai Zhao,et al.  A pneumonia outbreak associated with a new coronavirus of probable bat origin , 2020, Nature.

[24]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[25]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[26]  D. Wang,et al.  The origin, transmission and clinical therapies on coronavirus disease 2019 (COVID-19) outbreak – an update on the status , 2020, Military Medical Research.

[27]  Y. Hu,et al.  Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China , 2020, The Lancet.

[28]  Andrew Rambaut,et al.  Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) , 2016, Virus evolution.

[29]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[30]  O. Gascuel,et al.  SMS: Smart Model Selection in PhyML , 2017, Molecular biology and evolution.

[31]  Paul Kellam,et al.  Spread, Circulation, and Evolution of the Middle East Respiratory Syndrome Coronavirus , 2014, mBio.

[32]  F. Balloux,et al.  Emergence of genomic diversity and recurrent mutations in SARS-CoV-2 , 2020, Infection, Genetics and Evolution.

[33]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[34]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.