Rhapsody: predicting the pathogenicity of human missense variants

Abstract Motivation The biological effects of human missense variants have been studied experimentally for decades but predicting their effects in clinical molecular diagnostics remains challenging. Available computational tools are usually based on the analysis of sequence conservation and structural properties of the mutant protein. We recently introduced a new machine learning method that demonstrated for the first time the significance of protein dynamics in determining the pathogenicity of missense variants. Results Here, we present a new interface (Rhapsody) that enables fully automated assessment of pathogenicity, incorporating both sequence coevolution data and structure- and dynamics-based features. Benchmarked against a dataset of about 20 000 annotated variants, the methodology is shown to outperform well-established and/or advanced prediction tools. We illustrate the utility of Rhapsody by in silico saturation mutagenesis studies of human H-Ras, phosphatase and tensin homolog and thiopurine S-methyltransferase. Availability and implementation The new tool is available both as an online webserver at http://rhapsody.csb.pitt.edu and as an open-source Python package (GitHub repository: https://github.com/prody/rhapsody; PyPI package installation: pip install prody-rhapsody). Links to additional resources, tutorials and package documentation are provided in the 'Python package' section of the website. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Jaroslav Bendl,et al.  PredictSNP: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations , 2014, PLoS Comput. Biol..

[2]  Ozlem Keskin,et al.  Analysis of single amino acid variations in singlet hot spots of protein‐protein interfaces , 2018, Bioinform..

[3]  Vanessa E. Gray,et al.  Multiplex Assessment of Protein Variant Abundance by Massively Parallel Sequencing , 2018, Nature Genetics.

[4]  E. Boerwinkle,et al.  dbNSFP v3.0: A One‐Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice‐Site SNVs , 2016, Human mutation.

[5]  Karsten M. Borgwardt,et al.  The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity , 2015, Human mutation.

[6]  M. Froeyen,et al.  A novel pathogenic missense variant in CNNM4 underlying Jalili syndrome: Insights from molecular dynamics simulations , 2019, Molecular genetics & genomic medicine.

[7]  Ivet Bahar,et al.  DynOmics: dynamics of structural proteome and beyond , 2017, Nucleic Acids Res..

[8]  Dariya S. Glazer,et al.  The FEATURE framework for protein function annotation: modeling new functions, improving performance, and extending to novel applications , 2008, BMC Genomics.

[9]  Steven E Brenner,et al.  Reports from the fifth edition of CAGI: The Critical Assessment of Genome Interpretation , 2019, Human mutation.

[10]  T. Manivasagam,et al.  A Comprehensive In Silico Analysis on the Structural and Functional Impact of SNPs in the Congenital Heart Defects Associated with NKX2-5 Gene—A Molecular Dynamic Simulation Approach , 2016, PloS one.

[11]  M. Weigt,et al.  Context-Aware Prediction of Pathogenicity of Missense Mutations Involved in Human Disease , 2017, bioRxiv.

[12]  Hongchun Li,et al.  Shared Signature Dynamics Tempered by Local Fluctuations Enables Fold Adaptability and Specificity , 2019, Molecular biology and evolution.

[13]  Lipika R. Pal,et al.  Assessment of methods for predicting the effects of PTEN and TPMT protein variants , 2019, Human mutation.

[14]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[15]  Structure-Based Analysis of Single Nucleotide Variants in the Renin-Angiotensinogen Complex. , 2017, Global heart.

[16]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[17]  M. Gönen,et al.  Protein dynamics analysis reveals that missense mutations in cancer‐related genes appear frequently on hinge‐neighboring residues , 2019, Proteins.

[18]  Russ B. Altman,et al.  Improving the prediction of disease-related variants using protein three-dimensional structure , 2011, BMC Bioinformatics.

[19]  Douglas E. V. Pires,et al.  DynaMut: predicting the impact of mutations on protein conformation, flexibility and stability , 2018, Nucleic Acids Res..

[20]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[21]  Ivet Bahar,et al.  Mechanisms of CFTR Functional Variants That Impair Regulated Bicarbonate Permeation and Increase Risk for Pancreatitis but Not for Cystic Fibrosis , 2014, PLoS genetics.

[22]  Anaïs Mottaz,et al.  Bioinformatics Applications Note Databases and Ontologies Easy Retrieval of Single Amino-acid Polymorphisms and Phenotype Information Using Swissvar , 2022 .

[23]  Luonan Chen,et al.  Integrating In Silico Prediction Methods, Molecular Docking, and Molecular Dynamics Simulation to Predict the Impact of ALK Missense Mutations in Structural Perspective , 2014, BioMed research international.

[24]  Johnny S. H. Kwan,et al.  Predicting Mendelian Disease-Causing Non-Synonymous Single Nucleotide Variants in Exome Sequencing Studies , 2013, PLoS genetics.

[25]  Ambuj Kumar,et al.  Use of Long Term Molecular Dynamics Simulation in Predicting Cancer Associated SNPs , 2014, PLoS Comput. Biol..

[26]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[27]  Aashish N. Adhikari,et al.  VIPdb, a genetic Variant Impact Predictor Database , 2019, Human mutation.

[28]  M. Vihinen,et al.  Performance of mutation pathogenicity prediction methods on missense variants , 2011, Human mutation.

[29]  Christopher T. Saunders,et al.  Evaluation of structural and evolutionary contributions to deleterious mutation prediction. , 2002, Journal of molecular biology.

[30]  P. Stenson,et al.  The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies , 2017, Human Genetics.

[31]  E. Boerwinkle,et al.  dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions , 2011, Human mutation.

[32]  I. Bahar,et al.  Global dynamics of proteins: bridging between structure and function. , 2010, Annual review of biophysics.

[33]  P. Brigidi,et al.  The Three Genetics (Nuclear DNA, Mitochondrial DNA, and Gut Microbiome) of Longevity in Humans Considered as Metaorganisms , 2014, BioMed research international.

[34]  Marianne Rooman,et al.  Prediction and interpretation of deleterious coding variants in terms of protein structural stability , 2018, Scientific Reports.

[35]  Thomas A. Hopf,et al.  Mutation effects predicted from sequence co-variation , 2017, Nature Biotechnology.

[36]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[37]  A. Chakraborty,et al.  Deconstruction of the Ras switching cycle through saturation mutagenesis , 2017, eLife.

[38]  Ryan L. Collins,et al.  Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes , 2019, bioRxiv.

[39]  Ivet Bahar,et al.  Structural dynamics is a determinant of the functional significance of missense variants , 2018, Proceedings of the National Academy of Sciences.

[40]  Özlem Tastan Bishop,et al.  Role of Structural Bioinformatics in Drug Discovery by Computational SNP Analysis: Analyzing Variation at the Protein Level. , 2017, Global heart.

[41]  Ivet Bahar,et al.  The anisotropic network model web server at 2015 (ANM 2.0) , 2015, Bioinform..

[42]  L. Vuillon,et al.  In proteins, the structural responses of a position to mutation rely on the Goldilocks principle: not too many links, not too few. , 2018, Physical chemistry chemical physics : PCCP.

[43]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[44]  Peng Yue,et al.  SNPs3D: Candidate gene and SNP selection for association studies , 2006, BMC Bioinformatics.

[45]  Ivet Bahar,et al.  iGNM 2.0: the Gaussian network model database for biomolecular structural dynamics , 2015, Nucleic Acids Res..

[46]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[47]  Luca Ponzoni,et al.  Complementary computational and experimental evaluation of missense variants in the ROMK potassium channel , 2020, PLoS Comput. Biol..

[48]  Ivet Bahar,et al.  ProDy: Protein Dynamics Inferred from Theory and Experiments , 2011, Bioinform..