Novel Immunoglobulin Domain Proteins Provide Insights into Evolution and Pathogenesis of SARS-CoV-2-Related Viruses

The ongoing COVID-19 pandemic strongly emphasizes the need for a more complete understanding of the biology and pathogenesis of its causative agent SARS-CoV-2. Despite intense scrutiny, several proteins encoded by the genomes of SARS-CoV-2 and other SARS-like coronaviruses remain enigmatic. Moreover, the high infectivity and severity of SARS-CoV-2 in certain individuals make wet-lab studies currently challenging. In this study, we used a series of computational strategies to identify several fast-evolving regions of SARS-CoV-2 proteins which are potentially under host immune pressure. Most notably, the hitherto-uncharacterized protein encoded by ORF8 is one of them. Using sensitive sequence and structural analysis methods, we show that ORF8 and several other proteins from alpha- and beta-coronavirus comprise novel families of immunoglobulin domain proteins, which might function as potential immune modulators to delay or attenuate the host immune response against the viruses. ABSTRACT A novel coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), was recently identified as the causative agent for the coronavirus disease 2019 (COVID-19) outbreak that has generated a global health crisis. We use a combination of genomic analysis and sensitive profile-based sequence and structure analysis to understand the potential pathogenesis determinants of this virus. As a result, we identify several fast-evolving genomic regions that might be at the interface of virus-host interactions, corresponding to the receptor binding domain of the Spike protein, the three tandem Macro fold domains in ORF1a, and the uncharacterized protein ORF8. Further, we show that ORF8 and several other proteins from alpha- and beta-CoVs belong to novel families of immunoglobulin (Ig) proteins. Among them, ORF8 is distinguished by being rapidly evolving, possessing a unique insert, and having a hypervariable position among SARS-CoV-2 genomes in its predicted ligand-binding groove. We also uncover numerous Ig domain proteins from several unrelated metazoan viruses, which are distinct in sequence and structure but share comparable architectures to those of the CoV Ig domain proteins. Hence, we propose that SARS-CoV-2 ORF8 and other previously unidentified CoV Ig domain proteins fall under the umbrella of a widespread strategy of deployment of Ig domain proteins in animal viruses as pathogenicity factors that modulate host immunity. The rapid evolution of the ORF8 Ig domain proteins points to a potential evolutionary arms race between viruses and hosts, likely arising from immune pressure, and suggests a role in transmission between distinct host species. IMPORTANCE The ongoing COVID-19 pandemic strongly emphasizes the need for a more complete understanding of the biology and pathogenesis of its causative agent SARS-CoV-2. Despite intense scrutiny, several proteins encoded by the genomes of SARS-CoV-2 and other SARS-like coronaviruses remain enigmatic. Moreover, the high infectivity and severity of SARS-CoV-2 in certain individuals make wet-lab studies currently challenging. In this study, we used a series of computational strategies to identify several fast-evolving regions of SARS-CoV-2 proteins which are potentially under host immune pressure. Most notably, the hitherto-uncharacterized protein encoded by ORF8 is one of them. Using sensitive sequence and structural analysis methods, we show that ORF8 and several other proteins from alpha- and beta-coronavirus comprise novel families of immunoglobulin domain proteins, which might function as potential immune modulators to delay or attenuate the host immune response against the viruses.

[1]  C. Eastin,et al.  Clinical Characteristics of Coronavirus Disease 2019 in China , 2020, The Journal of Emergency Medicine.

[2]  Gavin J. D. Smith,et al.  Discovery of a 382-nt deletion during the early evolution of SARS-CoV-2 , 2020, bioRxiv.

[3]  A. Walls,et al.  Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein , 2020, Cell.

[4]  G. Herrler,et al.  SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor , 2020, Cell.

[5]  K. Yuen,et al.  Clinical Characteristics of Coronavirus Disease 2019 in China , 2020, The New England journal of medicine.

[6]  Yan Bai,et al.  Presumed Asymptomatic Carrier Transmission of COVID-19. , 2020, JAMA.

[7]  Federico M Giorgi,et al.  Genomic variance of the 2019‐nCoV coronavirus , 2020, Journal of medical virology.

[8]  J. Kanne,et al.  Chest CT Findings in 2019 Novel Coronavirus (2019-nCoV) Infections from Wuhan, China: Key Points for the Radiologist , 2020, Radiology.

[9]  Kai Zhao,et al.  A pneumonia outbreak associated with a new coronavirus of probable bat origin , 2020, Nature.

[10]  Fei Chen,et al.  Origin and Evolution of the 2019 Novel Coronavirus , 2020, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[11]  Ting Yu,et al.  Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study , 2020, The Lancet.

[12]  G. Panayiotakopoulos,et al.  Full-genome evolutionary analysis of the novel corona virus (2019-nCoV) rejects the hypothesis of emergence as a result of a recent recombination event , 2020, bioRxiv.

[13]  G. Gao,et al.  A Novel Coronavirus from Patients with Pneumonia in China, 2019 , 2020, The New England journal of medicine.

[14]  Y. Hu,et al.  Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China , 2020, The Lancet.

[15]  J. Luban SARS-CoV-2 , 2020 .

[16]  Konstantinos D. Tsirigos,et al.  SignalP 5.0 improves signal peptide predictions using deep neural networks , 2019, Nature Biotechnology.

[17]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[18]  Zhènglì Shí,et al.  Origin and evolution of pathogenic coronaviruses , 2018, Nature Reviews Microbiology.

[19]  A. Pfeifer,et al.  Attenuation of replication by a 29 nucleotide deletion in SARS-coronavirus acquired during the early stages of human-to-human transmission , 2018, Scientific Reports.

[20]  P. Daszak,et al.  Fatal swine acute diarrhoea syndrome caused by an HKU2-related coronavirus of bat origin , 2018, Nature.

[21]  Clemens Vonrhein,et al.  Structural and molecular basis of mismatch correction and ribavirin excision from coronavirus RNA , 2017, Proceedings of the National Academy of Sciences.

[22]  Liam J. McGuffin,et al.  ReFOLD: a server for the refinement of 3D protein models guided by accurate quality estimates , 2017, Nucleic Acids Res..

[23]  Liang Wang,et al.  A Bat-Derived Putative Cross-Family Recombinant Coronavirus with a Reovirus Gene , 2016, PLoS pathogens.

[24]  L. Aravind,et al.  Transposons to toxins: the provenance, architecture and diversification of a widespread class of eukaryotic effectors , 2016, Nucleic acids research.

[25]  Sudhir Kumar,et al.  MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. , 2016, Molecular biology and evolution.

[26]  L. Aravind,et al.  Comparative genomic analyses reveal a vast, novel network of nucleotide-centric systems in biological conflicts, immunity and signaling , 2015, Nucleic acids research.

[27]  Sandra Postel,et al.  Severe Acute Respiratory Syndrome Coronavirus ORF7a Inhibits Bone Marrow Stromal Antigen 2 Virion Tethering through a Novel Mechanism of Glycosylation Interference , 2015, Journal of Virology.

[28]  Samson S. Y. Wong,et al.  Severe Acute Respiratory Syndrome (SARS) Coronavirus ORF8 Protein Is Acquired from SARS-Related Coronavirus from Greater Horseshoe Bats through Recombination , 2015, Journal of Virology.

[29]  Alexey Drozdetskiy,et al.  JPred4: a protein secondary structure prediction server , 2015, Nucleic Acids Res..

[30]  L. Aravind,et al.  The natural history of ADP-ribosyltransferases and the ADP-ribosylation system. , 2015, Current topics in microbiology and immunology.

[31]  L. Aravind,et al.  Computational identification of novel biochemical systems involved in oxidation, glycosylation and other complex modifications of bases in DNA , 2013, Nucleic acids research.

[32]  A. Osterhaus,et al.  Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia. , 2012, The New England journal of medicine.

[33]  Vivek Anantharaman,et al.  Polymorphic toxin systems: Comprehensive characterization of trafficking modes, processing, mechanisms of action, immunity and ecology using comparative genomics , 2012, Biology Direct.

[34]  A. Alcamí,et al.  Antibody Inhibition of a Viral Type 1 Interferon Decoy Receptor Cures a Viral Disease by Restoring Interferon Signaling in the Liver , 2012, PLoS pathogens.

[35]  L. Aravind,et al.  Evolution of the deaminase fold and multiple origins of eukaryotic editing and mutagenic nucleic acid deaminases from bacterial toxin systems , 2011, Nucleic acids research.

[36]  L. Aravind,et al.  A novel immunity system for bacterial nucleic acid degrading toxins and its recruitment in various eukaryotic and DNA viral systems , 2011, Nucleic acids research.

[37]  Sean R Eddy,et al.  A new generation of homology search tools based on probabilistic inference. , 2009, Genome informatics. International Conference on Genome Informatics.

[38]  Liisa Holm,et al.  Searching protein structure databases with DaliLite v.3 , 2008, Bioinform..

[39]  S. Perlman,et al.  Pathogenicity of severe acute respiratory coronavirus deletion mutants in hACE-2 transgenic mice , 2008, Virology.

[40]  N. Grishin,et al.  PROMALS3D: a tool for multiple protein sequence and structure alignments , 2008, Nucleic acids research.

[41]  R. Buller,et al.  Structure and mechanism of IFN-γ antagonism by an orthopoxvirus IFN-γ-binding protein , 2008, Proceedings of the National Academy of Sciences.

[42]  P. Rottier,et al.  The 29-Nucleotide Deletion Present in Human but Not in Animal Severe Acute Respiratory Syndrome Coronaviruses Disrupts the Functional Expression of Open Reading Frame 8 , 2007, Journal of Virology.

[43]  Caroline C. Friedel,et al.  Analysis of Intraviral Protein-Protein Interactions of the SARS Coronavirus ORFeome , 2007, PloS one.

[44]  Ben M. Webb,et al.  Comparative Protein Structure Modeling Using Modeller , 2006, Current protocols in bioinformatics.

[45]  T. Ahola,et al.  Structural and Functional Basis for ADP-Ribose and Poly(ADP-Ribose) Binding by Viral Macro Domains , 2006, Journal of Virology.

[46]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[47]  Matthias Stoldt,et al.  Solution structure of the X4 protein coded by the SARS related coronavirus reveals an immunoglobulin like fold and suggests a binding activity to integrin I domains , 2005, Journal of biomedical science.

[48]  R. Baric,et al.  Severe Acute Respiratory Syndrome Coronavirus Group-Specific Open Reading Frames Encode Nonessential Functions for Replication in Cell Cultures and Mice , 2005, Journal of Virology.

[49]  Erik L. L. Sonnhammer,et al.  Kalign – an accurate and fast multiple sequence alignment algorithm , 2005, BMC Bioinformatics.

[50]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[51]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[52]  Andrew Pekosz,et al.  Structure and Intracellular Targeting of the SARS-Coronavirus Orf7a Accessory Protein , 2005, Structure.

[53]  D. Cozzetto,et al.  Relationship between multiple sequence alignments and quality of protein comparative models , 2004, Proteins.

[54]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[55]  Obi L. Griffith,et al.  The Genome Sequence of the SARS-Associated Coronavirus , 2003, Science.

[56]  Lubert Stryer,et al.  Biochemistry 5th ed , 2002 .

[57]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[58]  E. Koonin,et al.  Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. , 1999, Journal of molecular biology.

[59]  Marian C. Horzinek,et al.  Structure-function analysis of the gE-gI complex of feline herpesvirus: mapping of gI domains required for gE-gI interaction, intracellular transport, and cell-to-cell spread , 1997, Journal of virology.

[60]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[61]  C. Chothia,et al.  Members of the immunoglobulin superfamily in bacteria , 1996, Protein science : a publication of the Protein Society.

[62]  F. Deryckère,et al.  Early region 3 of adenovirus type 19 (subgroup D) encodes an HLA-binding protein distinct from that of subgroups B and C , 1996, Journal of virology.

[63]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[64]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[65]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[66]  R. Chanock,et al.  Role of early region 3 (E3) in pathogenesis of adenovirus disease. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[67]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.