Prediction and interpretation of deleterious coding variants in terms of protein structural stability

The classification of human genetic variants into deleterious and neutral is a challenging issue, whose complexity is rooted in the large variety of biophysical mechanisms that can be responsible for disease conditions. For non-synonymous mutations in structured proteins, one of these is the protein stability change, which can lead to loss of protein structure or function. We developed a stability-driven knowledge-based classifier that uses protein structure, artificial neural networks and solvent accessibility-dependent combinations of statistical potentials to predict whether destabilizing or stabilizing mutations are disease-causing. Our predictor yields a balanced accuracy of 71% in cross validation. As expected, it has a very high positive predictive value of 89%: it predicts with high accuracy the subset of mutations that are deleterious because of stability issues, but is by construction unable of classifying variants that are deleterious for other reasons. Its combination with an evolutionary-based predictor increases the balanced accuracy up to 75%, and allowed predicting more than 1/4 of the variants with 95% positive predictive value. Our method, called SNPMuSiC, can be used with both experimental and modeled structures and compares favorably with other prediction tools on several independent test sets. It constitutes a step towards interpreting variant effects at the molecular scale. SNPMuSiC is freely available at https://soft.dezyme.com/.

[1]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[2]  T. Andrews,et al.  Comparison of predicted and actual consequences of missense mutations , 2015, Proceedings of the National Academy of Sciences.

[3]  Jana Marie Schwarz,et al.  MutationTaster evaluates disease-causing potential of sequence alterations , 2010, Nature Methods.

[4]  Gary D Bader,et al.  Evolutionary Constraint and Disease Associations of Post-Translational Modification Sites in Human Genomes , 2015, PLoS genetics.

[5]  Dan S. Tawfik,et al.  Stability effects of mutations and protein evolvability. , 2009, Current opinion in structural biology.

[6]  O. Lichtarge,et al.  Predicting phenotype from genotype: Improving accuracy through more robust experimental and computational modeling , 2017, Human mutation.

[7]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[8]  D Gilis,et al.  A new generation of statistical potentials for proteins. , 2006, Biophysical journal.

[9]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[10]  Jacinte Beerten,et al.  Structural hot spots for the solubility of globular proteins , 2016, Nature Communications.

[11]  M. Sternberg,et al.  The effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on protein-protein interactions. , 2013, Journal of molecular biology.

[12]  Maria Jesus Martin,et al.  SIFTS: Structure Integration with Function, Taxonomy and Sequences resource , 2012, Nucleic Acids Res..

[13]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[14]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[15]  O. Lichtarge,et al.  A formal perturbation equation between genotype and phenotype determines the Evolutionary Action of protein-coding variations on fitness , 2014, Genome research.

[16]  Douglas E. V. Pires,et al.  mCSM-lig: quantifying the effects of mutations on protein-small molecule affinity in genetic disease and emergence of drug resistance , 2016, Scientific Reports.

[17]  Richard Bonneau,et al.  Robust classification of protein variation using structural modelling and large-scale data integration , 2015, bioRxiv.

[18]  Marianne Rooman,et al.  Predicting protein thermal stability changes upon point mutations using statistical potentials: Introducing HoTMuSiC , 2016, Scientific Reports.

[19]  M. Sternberg,et al.  SuSPect: Enhanced Prediction of Single Amino Acid Variant (SAV) Phenotype Using Network Features , 2014, Journal of molecular biology.

[20]  Marianne Rooman,et al.  Structure-based mutant stability predictions on proteins of unknown structure. , 2012, Journal of biotechnology.

[21]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[22]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[23]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[24]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[25]  S. Wodak,et al.  Prediction of protein backbone conformation based on seven structure assignments. Influence of local interactions. , 1991, Journal of molecular biology.

[26]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[27]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[28]  Marianne Rooman,et al.  High-quality thermodynamic data on the stability changes of proteins upon single-site mutations , 2016, bioRxiv.

[29]  Tom Lenaerts,et al.  DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins , 2017, Nucleic Acids Res..

[30]  Christopher T. Saunders,et al.  Evaluation of structural and evolutionary contributions to deleterious mutation prediction. , 2002, Journal of molecular biology.

[31]  P. Stenson,et al.  The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies , 2017, Human Genetics.

[32]  Bradley P. Coe,et al.  Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations , 2012, Nature.

[33]  Tom Lenaerts,et al.  Multilevel biological characterization of exomic variants at the protein level significantly improves the identification of their deleterious effects , 2016, Bioinform..

[34]  Philippe Bogaerts,et al.  Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0 , 2009, Bioinform..

[35]  Shannon K. Stefl,et al.  Molecular mechanisms of disease-causing missense mutations. , 2013, Journal of molecular biology.

[36]  J. Miller,et al.  Predicting the Functional Effect of Amino Acid Substitutions and Indels , 2012, PloS one.

[37]  Douglas E. V. Pires,et al.  In silico functional dissection of saturation mutagenesis: Interpreting the relationship between phenotypes and changes in protein stability, interactions and activity , 2016, Scientific Reports.

[38]  A. Fiser Template-based protein structure modeling. , 2010, Methods in molecular biology.

[39]  Anaïs Mottaz,et al.  Bioinformatics Applications Note Databases and Ontologies Easy Retrieval of Single Amino-acid Polymorphisms and Phenotype Information Using Swissvar , 2022 .

[40]  A. Barabasi,et al.  Interactome Networks and Human Disease , 2011, Cell.

[41]  Li Ding,et al.  Protein-structure-guided discovery of functional mutations across 19 cancer types , 2016, Nature Genetics.

[42]  S. Wodak,et al.  Factors influencing the ability of knowledge-based potentials to identify native sequence-structure matches. , 1994, Journal of molecular biology.

[43]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[44]  Yongwook Choi,et al.  PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels , 2015, Bioinform..

[45]  M. Sippl Calculation of conformational ensembles from potentials of mena force , 1990 .