Comprehensive evolution and molecular characteristics of a large number of SARS-CoV-2 genomes reveal its epidemic trends

Objectives To further reveal the phylogenetic evolution and molecular characteristics of the whole genome of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) based on a large number of genomes and provide a basis for the prevention and treatment of SARS-CoV-2. Methods Various evolution analysis methods were employed. Results The estimated ratio of the rates of non-synonymous to synonymous changes (Ka/Ks) of SARS-CoV-2 was 1.008 or 1.094 based on 622 or 3624 SARS-CoV-2 genomes and 9 key specific sites of highly linkage and four major haplotypes H1, H2, H3 and H4 were found. The results of Ka/Ks, detected population size and development trends of each major haplotype showed H3 and H4 subgroups were going through a purify evolution and almost disappeared after detection, indicating they might have existed for a long time. H1 and H2 subgroups were going through a near neutral or neutral evolution and globally increased with time, and the frequency of H1 was generally high in Europe and correlated to death rate (r>0.37), suggesting these two haplotypes might relate to infectivity or pathogenicity of SARS-CoV-2. Conclusions Several key specific sites and haplotypes related to infectivity or pathogenicity of SARS-CoV-2 as well as the possible earlier origin time and place of SARS-CoV-2 were indicated based on evolution and epidemiology of 16373 SARS-CoV-2 genomes.

[1]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[2]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[3]  E. Holmes,et al.  The proximal origin of SARS-CoV-2 , 2020, Nature Medicine.

[4]  Amy C. Sims,et al.  SARS coronavirus replicase proteins in pathogenesis , 2007, Virus Research.

[5]  Jun Li,et al.  KaKs_Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging , 2007, Genom. Proteom. Bioinform..

[6]  Colin Renfrew,et al.  Phylogenetic network analysis of SARS-CoV-2 genomes , 2020, Proceedings of the National Academy of Sciences.

[7]  O. Gascuel,et al.  SMS: Smart Model Selection in PhyML , 2017, Molecular biology and evolution.

[8]  Fang Li,et al.  Structure, Function, and Evolution of Coronavirus Spike Proteins. , 2016, Annual review of virology.

[9]  P. Bork,et al.  Interactive Tree Of Life (iTOL) v4: recent updates and new developments , 2019, Nucleic Acids Res..

[10]  Neville E. Sanjana,et al.  The Spike D614G mutation increases SARS-CoV-2 infection of multiple human cell types , 2020, bioRxiv.

[11]  David Bryant,et al.  popart: full‐feature software for haplotype network construction , 2015 .

[12]  H. Deng,et al.  D614G mutation of SARS-CoV-2 spike protein enhances viral infectivity , 2020, bioRxiv.

[13]  Y. Hu,et al.  Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China , 2020, The Lancet.

[14]  Kai Zhao,et al.  A pneumonia outbreak associated with a new coronavirus of probable bat origin , 2020, Nature.

[15]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[16]  B. Graham,et al.  Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation , 2020, Science.

[17]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[18]  Chikin Chan,et al.  SARS-CoV-2 and COVID-19: The most important research questions , 2020, Cell & Bioscience.

[19]  D. Wang,et al.  The origin, transmission and clinical therapies on coronavirus disease 2019 (COVID-19) outbreak – an update on the status , 2020, Military Medical Research.

[20]  Athanasia Pavlopoulou,et al.  Codon Usage and Phenotypic Divergences of SARS-CoV-2 Genes , 2020, Viruses.

[21]  Andrew Rambaut,et al.  Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) , 2016, Virus evolution.

[22]  M. Farzan,et al.  The D614G mutation in the SARS-CoV-2 spike protein reduces S1 shedding and increases infectivity , 2020, bioRxiv.

[23]  H. Deng,et al.  The D614G mutation of SARS-CoV-2 spike protein enhances viral infectivity and decreases neutralization sensitivity to individual convalescent sera , 2020 .

[24]  E. Holmes,et al.  Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding , 2020, The Lancet.

[25]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[26]  Wen-Bin Yu,et al.  Decoding the evolution and transmissions of the novel pneumonia coronavirus (SARS-CoV-2 / HCoV-19) using whole genomic data , 2020, Zoological research.

[27]  Sebastián Duchêne,et al.  BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis , 2019, PLoS computational biology.

[28]  Jia-Fu Jiang,et al.  Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins , 2020, Nature.

[29]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[30]  Fei Chen,et al.  Origin and Evolution of the 2019 Novel Coronavirus , 2020, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[31]  Paul Kellam,et al.  Spread, Circulation, and Evolution of the Middle East Respiratory Syndrome Coronavirus , 2014, mBio.

[32]  Neville E. Sanjana,et al.  The D614G mutation in SARS-CoV-2 Spike increases transduction of multiple human cell types , 2020 .

[33]  F. Balloux,et al.  Emergence of genomic diversity and recurrent mutations in SARS-CoV-2 , 2020, Infection, Genetics and Evolution.

[34]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[35]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[36]  T. Tabuchi,et al.  Coronavirus Disease , 2021, Encyclopedia of the UN Sustainable Development Goals.

[37]  E. Holmes,et al.  A Genomic Perspective on the Origin and Emergence of SARS-CoV-2 , 2020, Cell.

[38]  K. To,et al.  Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan , 2020, Emerging microbes & infections.

[39]  Sudhir Kumar,et al.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. , 2018, Molecular biology and evolution.

[40]  J. Rocklöv,et al.  The reproductive number of COVID-19 is higher compared to SARS coronavirus , 2020, Journal of travel medicine.

[41]  Zhongming Zhao,et al.  Moderate mutation rate in the SARS coronavirus genome and its implications , 2004, BMC Evolutionary Biology.

[42]  M. Holder,et al.  Phylogeny estimation: traditional and Bayesian approaches , 2003, Nature Reviews Genetics.