SSSCPreds: Deep Neural Network-Based Software for the Prediction of Conformational Variability and Application to SARS-CoV-2

Amino acid mutations that improve protein stability and rigidity can accompany increases in binding affinity. Therefore, conserved amino acids located on a protein surface may be successfully targeted by antibodies. The quantitative deep mutational scanning approach is an excellent technique to understand viral evolution, and the obtained data can be utilized to develop a vaccine. However, the application of the approach to all of the proteins in general is difficult in terms of cost. To address this need, we report the construction of a deep neural network-based program for sequence-based prediction of supersecondary structure codes (SSSCs), called SSSCPrediction (SSSCPred). Further, to predict conformational flexibility or rigidity in proteins, a comparison program called SSSCPreds that consists of three deep neural network-based prediction systems (SSSCPred, SSSCPred100, and SSSCPred200) has also been developed. Using our algorithms we calculated here shows the degree of flexibility for the receptor-binding motif of SARS-CoV-2 spike protein and the rigidity of the unique motif (SSSC: SSSHSSHHHH) at the S2 subunit and has a value independent of the X-ray and Cryo-EM structures. The fact that the sequence flexibility/rigidity map of SARS-CoV-2 RBD resembles the sequence-to-phenotype maps of ACE2-binding affinity and expression, which were experimentally obtained by deep mutational scanning, suggests that the identical SSSC sequences among the ones predicted by three deep neural network-based systems correlate well with the sequences with both lower ACE2-binding affinity and lower expression. The combined analysis of predicted and observed SSSCs with keyword-tagged datasets would be helpful in understanding the structural correlation to the examined system.

[1]  Konstantinos D. Tsirigos,et al.  SignalP 5.0 improves signal peptide predictions using deep neural networks , 2019, Nature Biotechnology.

[2]  Laurence A. Nafie,et al.  Data Mining of Supersecondary Structure Homology between Light Chains of Immunogloblins and MHC Molecules: Absence of the Common Conformational Fragment in the Human IgM Rheumatoid Factor , 2013, J. Chem. Inf. Model..

[3]  A. Walls,et al.  Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein , 2020, Cell.

[4]  David T. Jones,et al.  DISOPRED3: precise disordered region predictions with annotated protein-binding activity , 2014, Bioinform..

[5]  Kuldip K. Paliwal,et al.  Capturing non‐local interactions by long short‐term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility , 2017, Bioinform..

[6]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[7]  G J Kleywegt,et al.  Phi/psi-chology: Ramachandran revisited. , 1996, Structure.

[8]  Hiroshi Izumi,et al.  Homology Searches Using Supersecondary Structure Code. , 2019, Methods in molecular biology.

[9]  Bosco K. Ho,et al.  The Ramachandran plots of glycine and pre-proline , 2005, BMC Structural Biology.

[10]  B. Graham,et al.  Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation , 2020, Science.

[11]  Erik L. L. Sonnhammer,et al.  An HMM posterior decoder for sequence feature prediction that includes homology information , 2005, ISMB.

[12]  D. Leahy,et al.  The role of the divalent cation in the structure of the I domain from the CD11a/CD18 integrin. , 1996, Structure.

[13]  Cinque S. Soto,et al.  Somatic Hypermutation-Induced Changes in the Structure and Dynamics of HIV-1 Broadly Neutralizing Antibodies. , 2016, Structure.

[14]  Yi Shi,et al.  Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal the dynamic receptor binding domains , 2017, Nature Communications.

[15]  Robert P. Sheridan,et al.  The EVcouplings Python framework for coevolutionary sequence analysis , 2018, bioRxiv.

[16]  Daniel Wrapp,et al.  Stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis , 2018, Scientific Reports.

[17]  G. Gao,et al.  Crystal Structure of Severe Acute Respiratory Syndrome Coronavirus Spike Protein Fusion Core* , 2004, Journal of Biological Chemistry.

[18]  Nicholas C. Wu,et al.  A highly conserved cryptic epitope in the receptor binding domains of SARS-CoV-2 and SARS-CoV , 2020, Science.

[19]  Rui Luo,et al.  The ORF6, ORF8 and nucleocapsid proteins of SARS-CoV-2 inhibit type I interferon signaling pathway , 2020, Virus Research.

[20]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[21]  Lukasz A. Kurgan,et al.  Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments , 2008, BMC Bioinformatics.

[22]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.

[23]  John A Tainer,et al.  Crystal Structure and Mutational Analysis of the Human CDK2 Kinase Complex with Cell Cycle–Regulatory Protein CksHs1 , 1996, Cell.

[24]  Wen J. Li,et al.  RefSeq: an update on prokaryotic genome annotation and curation , 2017, Nucleic Acids Res..

[25]  Linqi Zhang,et al.  Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor , 2020, Nature.

[26]  M. Shimaoka,et al.  Crystal structure of isoflurane bound to integrin LFA‐1 supports a unified mechanism of volatile anesthetic action in the immune and central nervous systems , 2009, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[27]  K. Shi,et al.  Structural basis of receptor recognition by SARS-CoV-2 , 2020, Nature.

[28]  J. Pandit,et al.  Potent and cellularly active 4-aminoimidazole inhibitors of cyclin-dependent kinase 5/p25 for the treatment of Alzheimer's disease. , 2009, Bioorganic & medicinal chemistry letters.

[29]  A. Lupas,et al.  Predicting coiled coils from protein sequences , 1991, Science.

[30]  M. Delorenzi,et al.  An HMM model for coiled-coil domains and a comparison with PSSM-based predictions , 2002, Bioinform..

[31]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[32]  E. Holmes,et al.  Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding , 2020, The Lancet.

[33]  Ole Winther,et al.  An introduction to deep learning on biological sequence data: examples and solutions , 2017, Bioinform..

[34]  Samson S. Y. Wong,et al.  Severe Acute Respiratory Syndrome (SARS) Coronavirus ORF8 Protein Is Acquired from SARS-Related Coronavirus from Greater Horseshoe Bats through Recombination , 2015, Journal of Virology.

[35]  Haixia Zhou,et al.  Cryo-electron microscopy structures of the SARS-CoV spike glycoprotein reveal a prerequisite conformational state for receptor binding , 2016, Cell Research.

[36]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences: current status, policy and new initiatives , 2008, Nucleic Acids Res..

[37]  J. Hurley,et al.  Structure of SARS-CoV-2 ORF8, a rapidly evolving coronavirus protein implicated in immune evasion , 2020, bioRxiv.

[38]  Yaoqi Zhou,et al.  Improving protein disorder prediction by deep bidirectional long short‐term memory recurrent neural networks , 2016, Bioinform..

[39]  Jesse D. Bloom,et al.  Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding , 2020, bioRxiv.

[40]  F. Dimaio,et al.  Cryo-electron microscopy structure of a coronavirus spike glycoprotein trimer , 2016, Nature.

[41]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[42]  Lukasz Kurgan,et al.  Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences. , 2019, Methods in molecular biology.

[43]  David S. Wishart,et al.  Improving the accuracy of protein secondary structure prediction using structural alignment , 2006, BMC Bioinformatics.

[44]  A. Pfeifer,et al.  Attenuation of replication by a 29 nucleotide deletion in SARS-coronavirus acquired during the early stages of human-to-human transmission , 2018, Scientific Reports.

[45]  K. Yuen,et al.  Structural and Functional Basis of SARS-CoV-2 Entry by Using Human ACE2 , 2020, Cell.

[46]  Qiang Zhou,et al.  Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 , 2020, Science.

[47]  Ole Winther,et al.  NetSurfP-2.0: improved prediction of protein structural features by integrated deep learning , 2018, bioRxiv.

[48]  Xiaogen Zhou,et al.  Protein structure and sequence re-analysis of 2019-nCoV genome does not indicate snakes as its intermediate host or the unique similarity between its spike protein insertions and HIV-1 , 2020, bioRxiv.

[49]  Gert Vriend,et al.  A series of PDB related databases for everyday needs , 2010, Nucleic Acids Res..

[50]  Félix A. Rey,et al.  Central ions and lateral asparagine/glutamine zippers stabilize the post-fusion hairpin conformation of the SARS coronavirus spike glycoprotein☆ , 2005, Virology.

[51]  Johannes Söding,et al.  Comparative analysis of coiled-coil prediction methods. , 2006, Journal of structural biology.

[52]  Yang Zhang,et al.  A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction , 2013, Scientific Reports.

[53]  Lukas Zimmermann,et al.  A Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at its Core. , 2017, Journal of molecular biology.

[54]  Kornelia Polyak,et al.  Mechanism of CDK activation revealed by the structure of a cyclinA-CDK2 complex , 1995, Nature.

[55]  Zhènglì Shí,et al.  Inhibition of SARS-CoV-2 (previously 2019-nCoV) infection by a highly potent pan-coronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion , 2020, Cell Research.

[56]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.