Rhapsody: Pathogenicity prediction of human missense variants based on protein sequence, structure and dynamics

The biological effects of human missense variants have been studied experimentally for decades but predicting their effects in clinical molecular diagnostics remains challenging. Available computational tools are usually based on the analysis of sequence conservation and structural properties of the mutant protein. We recently introduced a new machine learning method that demonstrated for the first time the significance of protein dynamics in determining the pathogenicity of missense variants. Here we present a significant extension that integrates coevolutionary data from Pfam database and we also introduce a new interface (Rhapsody) that enables fully automated assessment of pathogenicity. Benchmarked against a dataset of about 20,000 annotated variants, the methodology is shown to outperform well-established and/or advanced prediction tools. We illustrate the utility of our approach by in silico saturation mutagenesis study of human H-Ras. The tool is made available both as a webtool (rhapsody.csb.pitt.edu) and an open source Python package (pip install prody-rhapsody).

[1]  Ivet Bahar,et al.  The anisotropic network model web server at 2015 (ANM 2.0) , 2015, Bioinform..

[2]  Brandon M Butler,et al.  Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs , 2018, PLoS Comput. Biol..

[3]  L. Vuillon,et al.  In proteins, the structural responses of a position to mutation rely on the Goldilocks principle: not too many links, not too few. , 2018, Physical chemistry chemical physics : PCCP.

[4]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[5]  Ivet Bahar,et al.  iGNM 2.0: the Gaussian network model database for biomolecular structural dynamics , 2015, Nucleic Acids Res..

[6]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[7]  Anaïs Mottaz,et al.  Bioinformatics Applications Note Databases and Ontologies Easy Retrieval of Single Amino-acid Polymorphisms and Phenotype Information Using Swissvar , 2022 .

[8]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[9]  M. Wall,et al.  Allostery in a coarse-grained model of protein dynamics. , 2005, Physical review letters.

[10]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[11]  P. Doruker,et al.  RESPEC Incorporates Residue Specificity and the Ligand Effect into the Elastic Network Model. , 2018, The journal of physical chemistry. B.

[12]  M. Gönen,et al.  Protein dynamics analysis reveals that missense mutations in cancer‐related genes appear frequently on hinge‐neighboring residues , 2019, Proteins.

[13]  Marianne Rooman,et al.  Prediction and interpretation of deleterious coding variants in terms of protein structural stability , 2018, Scientific Reports.

[14]  Karsten M. Borgwardt,et al.  The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity , 2015, Human mutation.

[15]  Thomas A. Hopf,et al.  Mutation effects predicted from sequence co-variation , 2017, Nature Biotechnology.

[16]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[17]  M. Vihinen,et al.  Performance of mutation pathogenicity prediction methods on missense variants , 2011, Human mutation.

[18]  Ivet Bahar,et al.  Structural dynamics is a determinant of the functional significance of missense variants , 2018, Proceedings of the National Academy of Sciences.

[19]  Douglas E. V. Pires,et al.  DynaMut: predicting the impact of mutations on protein conformation, flexibility and stability , 2018, Nucleic Acids Res..

[20]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[21]  Gert Vriend,et al.  Everyday , 2020, Oxford Research Encyclopedia of Literature.

[22]  M. Weigt,et al.  Context-Aware Prediction of Pathogenicity of Missense Mutations Involved in Human Disease , 2017, bioRxiv.

[23]  I. Bahar,et al.  Global dynamics of proteins: bridging between structure and function. , 2010, Annual review of biophysics.

[24]  Mauno Vihinen,et al.  VariBench: A Benchmark Database for Variations , 2013, Human mutation.

[25]  Ivet Bahar,et al.  DynOmics: dynamics of structural proteome and beyond , 2017, Nucleic Acids Res..

[26]  Robert P. Sheridan,et al.  The EVcouplings Python framework for coevolutionary sequence analysis , 2018, bioRxiv.

[27]  Ivet Bahar,et al.  ProDy: Protein Dynamics Inferred from Theory and Experiments , 2011, Bioinform..

[28]  Magnus Ekeberg,et al.  Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences , 2014, J. Comput. Phys..

[29]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[30]  Jaroslav Bendl,et al.  PredictSNP: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations , 2014, PLoS Comput. Biol..

[31]  Ozlem Keskin,et al.  Analysis of single amino acid variations in singlet hot spots of protein‐protein interfaces , 2018, Bioinform..

[32]  Ivet Bahar,et al.  Toward a molecular understanding of the anisotropic response of proteins to external forces: insights from elastic network models. , 2008, Biophysical journal.

[33]  Hongchun Li,et al.  Shared Signature Dynamics Tempered by Local Fluctuations Enables Fold Adaptability and Specificity , 2019, Molecular biology and evolution.

[34]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[35]  A. Chakraborty,et al.  Deconstruction of the Ras switching cycle through saturation mutagenesis , 2017, eLife.

[36]  Ryan L. Collins,et al.  Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes , 2019, bioRxiv.

[37]  B. Brooks,et al.  Probing the local dynamics of nucleotide-binding pocket coupled to the global dynamics: myosin versus kinesin. , 2005, Biophysical journal.

[38]  Mirco Michel,et al.  PconsC4: fast, accurate and hassle-free contact predictions , 2019, Bioinform..

[39]  Johnny S. H. Kwan,et al.  Predicting Mendelian Disease-Causing Non-Synonymous Single Nucleotide Variants in Exome Sequencing Studies , 2013, PLoS genetics.

[40]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[41]  Ali Rana Atilgan,et al.  Perturbation-Response Scanning Reveals Ligand Entry-Exit Mechanisms of Ferric Binding Protein , 2009, PLoS Comput. Biol..

[42]  Ying Liu,et al.  Evol and ProDy for bridging protein sequence evolution and structural dynamics , 2014, Bioinform..

[43]  Gert Vriend,et al.  A series of PDB related databases for everyday needs , 2010, Nucleic Acids Res..

[44]  P. Stenson,et al.  The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies , 2017, Human Genetics.