Evolution and molecular characteristics of SARS-CoV-2 genome

In the evolution analysis of 622 complete human severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes with high quality, the estimated Ka/Ks ratio of SARS-CoV-2 is 1.008, which is significantly higher than that of SARS-CoV and MERS-CoV, and the time to the most recent common ancestor (tMRCA) of SARS-CoV-2 is inferred in late September 2019 (95% CI: 2019/08/28-2019/10/26), which indicating that SARS-CoV-2 may have completed a positive selection pressure of the cross-host evolution in the early stage and be going through a neutral evolution at present. In addition, no-root phylogenetic tree of the 622 SARS-CoV-2 genomes were constructed by maximum likelihood (ML) with the bootstrap value of 100. According to the phylogenetic trees, all genomes were divided into Cluster 1 to 3, in which genomes were mainly from North America, global and Europe respectively. Further we find 9 key specific sites of highly linkage which play a decisive role in the classification of each cluster. Among them, 3 and 4 sites of almost complete linkage are the specific sites for Cluster 1 and Cluster 3 respectively. Notably the frequencies of haplotype TTTG and H1 are generally high in European countries and correlated to death rate (r>0.4) based on more than 3500 SARS-CoV-2 genomes, which indicated that the haplotypes might be related to pathogenicity of SARS-CoV-2 and need to be addressed. According to haplotype changes in chronological order, the H3 haplotype subgroup disappeared soon after detection, while H1 haplotype subgroup was globally increasing with time. The evolution and molecular characteristics of more than 3500 genomic sequences provided a new perspective for revealing the epidemiology mechanism of SARS-CoV-2 and coping with SARS-CoV-2 effectively.

[1]  Kai Zhao,et al.  A pneumonia outbreak associated with a new coronavirus of probable bat origin , 2020, Nature.

[2]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[3]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[4]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[5]  Andrew Rambaut,et al.  Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) , 2016, Virus evolution.

[6]  E. Holmes,et al.  The proximal origin of SARS-CoV-2 , 2020, Nature Medicine.

[7]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[8]  P. Bork,et al.  Interactive Tree Of Life (iTOL) v4: recent updates and new developments , 2019, Nucleic Acids Res..

[9]  Wen-Bin Yu,et al.  Decoding the evolution and transmissions of the novel pneumonia coronavirus (SARS-CoV-2 / HCoV-19) using whole genomic data , 2020, Zoological research.

[10]  Y. Hu,et al.  Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China , 2020, The Lancet.

[11]  Sudhir Kumar,et al.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. , 2018, Molecular biology and evolution.

[12]  Jia-Fu Jiang,et al.  Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins , 2020, Nature.

[13]  B. Graham,et al.  Cryo-EM Structure of the 2019-nCoV Spike in the Prefusion Conformation , 2020, bioRxiv.

[14]  David Bryant,et al.  popart: full‐feature software for haplotype network construction , 2015 .

[15]  Fei Chen,et al.  Origin and Evolution of the 2019 Novel Coronavirus , 2020, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[16]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[17]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[18]  Colin Renfrew,et al.  Phylogenetic network analysis of SARS-CoV-2 genomes , 2020, Proceedings of the National Academy of Sciences.

[19]  Fang Li,et al.  Structure, Function, and Evolution of Coronavirus Spike Proteins. , 2016, Annual review of virology.

[20]  E. Holmes,et al.  Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding , 2020, The Lancet.

[21]  Amy C. Sims,et al.  SARS coronavirus replicase proteins in pathogenesis , 2007, Virus Research.

[22]  Jun Li,et al.  KaKs_Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging , 2007, Genom. Proteom. Bioinform..

[23]  Chikin Chan,et al.  SARS-CoV-2 and COVID-19: The most important research questions , 2020, Cell & Bioscience.

[24]  Sebastián Duchêne,et al.  BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis , 2018, bioRxiv.

[25]  M. Provinciali,et al.  Exploring the Relevance of Senotherapeutics for the Current SARS-CoV-2 Emergency and Similar Future Global Health Threats , 2020, Cells.

[26]  Yan Zhao,et al.  Neutrophil-to-lymphocyte ratio as an independent risk factor for mortality in hospitalized patients with COVID-19 , 2020, Journal of Infection.

[27]  Guoping Zhao,et al.  Molecular Evolution of the SARS Coronavirus During the Course of the SARS Epidemic in China , 2004, Science.

[28]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[29]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[30]  O. Gascuel,et al.  SMS: Smart Model Selection in PhyML , 2017, Molecular biology and evolution.

[31]  M. Holder,et al.  Phylogeny estimation: traditional and Bayesian approaches , 2003, Nature Reviews Genetics.

[32]  J. Rocklöv,et al.  The reproductive number of COVID-19 is higher compared to SARS coronavirus , 2020, Journal of travel medicine.

[33]  Zhongming Zhao,et al.  Moderate mutation rate in the SARS coronavirus genome and its implications , 2004, BMC Evolutionary Biology.

[34]  E. Holmes,et al.  A Genomic Perspective on the Origin and Emergence of SARS-CoV-2 , 2020, Cell.

[35]  K. To,et al.  Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan , 2020, Emerging microbes & infections.

[36]  D. Wang,et al.  The origin, transmission and clinical therapies on coronavirus disease 2019 (COVID-19) outbreak – an update on the status , 2020, Military Medical Research.

[37]  Paul Kellam,et al.  Spread, Circulation, and Evolution of the Middle East Respiratory Syndrome Coronavirus , 2014, mBio.