A Distinct Phylogenetic Cluster of Indian Severe Acute Respiratory Syndrome Coronavirus 2 Isolates

Abstract Background From an isolated epidemic, coronavirus disease 2019 has now emerged as a global pandemic. The availability of genomes in the public domain after the epidemic provides a unique opportunity to understand the evolution and spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus across the globe. Methods We performed whole-genome sequencing of 303 Indian isolates, and we analyzed them in the context of publicly available data from India. Results We describe a distinct phylogenetic cluster (Clade I/A3i) of SARS-CoV-2 genomes from India, which encompasses 22% of all genomes deposited in the public domain from India. Globally, approximately 2% of genomes, which to date could not be mapped to any distinct known cluster, fall within this clade. Conclusions The cluster is characterized by a core set of 4 genetic variants and has a nucleotide substitution rate of 1.1 × 10–3 variants per site per year, which is lower than the prevalent A2a cluster. Epidemiological assessments suggest that the common ancestor emerged at the end of January 2020 and possibly resulted in an outbreak followed by countrywide spread. To the best of our knowledge, this is the first comprehensive study characterizing this cluster of SARS-CoV-2 in India.

[1]  William L. Hamilton,et al.  Rapid implementation of SARS-CoV-2 sequencing to investigate cases of health-care associated COVID-19: a prospective genomic surveillance study , 2020, The Lancet Infectious Diseases.

[2]  P. Lemey,et al.  Temporal signal and the phylodynamic threshold of SARS-CoV-2 , 2020, bioRxiv.

[3]  L. Guddat,et al.  Structure of the RNA-dependent RNA polymerase from COVID-19 virus , 2020, Science.

[4]  J. Snowdon,et al.  Emergence of Drift Variants That May Affect COVID-19 Vaccine Development and Antibody Treatment , 2020, Pathogens.

[5]  Changchuan Yin Genotyping coronavirus SARS-CoV-2: methods and implications , 2020, Genomics.

[6]  Jin Tian,et al.  COVID-19: Epidemiology, Evolution, and Cross-Disciplinary Perspectives , 2020, Trends in Molecular Medicine.

[7]  Zhenhua Zhang,et al.  The establishment of reference sequence for SARS‐CoV‐2 and variation analysis , 2020, Journal of medical virology.

[8]  Xiang Li,et al.  On the origin and continuing evolution of SARS-CoV-2 , 2020, National science review.

[9]  Y. Hu,et al.  Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China , 2020, The Lancet.

[10]  J. Quick,et al.  nCoV-2019 sequencing protocol v1 , 2020 .

[11]  Steven L Salzberg,et al.  Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype , 2019, Nature Biotechnology.

[12]  P. Lemey,et al.  Tracking virus outbreaks in the twenty-first century , 2018, Nature Microbiology.

[13]  Trevor Bedford,et al.  Nextstrain: real-time tracking of pathogen evolution , 2017, bioRxiv.

[14]  M. Nielsen,et al.  NetMHCpan-4.0: Improved Peptide–MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data , 2017, The Journal of Immunology.

[15]  Yuelong Shu,et al.  GISAID: Global initiative on sharing all influenza data – from vision to reality , 2017, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[16]  P. Ng,et al.  SIFT missense predictions for genomes , 2015, Nature Protocols.

[17]  Yongwook Choi,et al.  PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels , 2015, Bioinform..

[18]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[19]  Dennis A. Benson,et al.  GenBank , 2012, Nucleic acids research.

[20]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[21]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[22]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[23]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.