Insights into protein structural, physicochemical, and functional consequences of missense variants in 1,330 disease-associated human genes

Inference of the structural and functional consequences of amino acid-altering missense variants is challenging and not yet scalable. Clinical and research applications of the colossal number of identified missense variants is thus limited. Here we describe the aggregation and analysis of large-scale genomic variation and structural biology data for 1,330 disease-associated genes. Comparing the burden of 40 structural, physicochemical, and functional protein features of altered amino acids with 3-dimensional coordinates, we found 18 and 14 features that are associated with pathogenic and population missense variants, respectively. Separate analyses of variants from 24 protein functional classes revealed novel function-dependent vulnerable features. We then devised a quantitative spectrum, identifying variants with higher pathogenic variant-associated features. Finally, we developed a web resource (MISCAST; http://miscast.broadinstitute.org/) for interactive analysis of variants on linear and tertiary protein structures. The biological impact of missense variants available through the webtool will assist researchers in hypothesizing variant pathogenicity and disease trajectories.

[1]  Ensembl , 2020, Definitions.

[2]  Hong Sun,et al.  New insights into the pathogenicity of non-synonymous variants through multi-level analysis , 2019, Scientific Reports.

[3]  Ryan L. Collins,et al.  Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes , 2019, bioRxiv.

[4]  M. Pellegrini,et al.  Capturing variation impact on molecular interactions in the IMEx Consortium mutations data set , 2019, Nature Communications.

[5]  Omar Wagih,et al.  A resource of variant effect predictions of single nucleotide variants in model organisms , 2018, Molecular systems biology.

[6]  R. Bellazzi,et al.  CardioVAI: An automatic implementation of ACMG‐AMP variant interpretation guidelines in the diagnosis of cardiovascular diseases , 2018, Human mutation.

[7]  Joseph D. Janizek,et al.  Accurate classification of BRCA1 variants with saturation genome editing , 2018, Nature.

[8]  Piotr Gawron,et al.  MolArt: a molecular structure annotation and visualization tool , 2018, Bioinform..

[9]  Jörg Hakenberg,et al.  Predicting the clinical impact of human mutation with deep neural networks , 2018, Nature Genetics.

[10]  Cristina Has,et al.  Faculty of 1000 evaluation for Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. , 2018 .

[11]  W. Bush,et al.  Comprehensive Analysis of Constraint on the Spatial Distribution of Missense Variants in Human Protein Structures. , 2018, American journal of human genetics.

[12]  Taylor L. Mighell,et al.  A saturation mutagenesis approach to understanding PTEN lipid phosphatase activity and genotype-phenotypes relationships , 2018, bioRxiv.

[13]  Chunlei Liu,et al.  ClinVar: improving access to variant interpretations and supporting evidence , 2017, Nucleic Acids Res..

[14]  Radka Svobodová Vařeková,et al.  PDBsum: Structural summaries of PDB entries , 2017, Protein science : a publication of the Protein Society.

[15]  David J Balding,et al.  Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation , 2017, Genome research.

[16]  Predrag Radivojac,et al.  Missense variant pathogenicity predictors generalize well across a range of function‐specific prediction challenges , 2017, Human mutation.

[17]  Maria Jesus Martin,et al.  ProtVista: visualization of protein sequence annotations , 2017, Bioinform..

[18]  Quan Li,et al.  InterVar: Clinical Interpretation of Genetic Variants by the 2015 ACMG-AMP Guidelines. , 2017, American journal of human genetics.

[19]  Anushya Muruganujan,et al.  PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements , 2016, Nucleic Acids Res..

[20]  György Abrusán,et al.  Alpha Helices Are More Robust to Mutations than Beta Strands , 2016, PLoS Comput. Biol..

[21]  Burkhard Rost,et al.  MSAViewer: interactive JavaScript visualization of multiple sequence alignments , 2016, Bioinform..

[22]  David L. Masica,et al.  Exome-Scale Discovery of Hotspot Mutation Regions in Human Cancer Using 3D Protein Structure. , 2016, Cancer research.

[23]  Richard Bonneau,et al.  Robust classification of protein variation using structural modelling and large-scale data integration , 2015, bioRxiv.

[24]  Matthew Mort,et al.  mutation3D: Cancer Gene Prediction Through Atomic Clustering of Coding Variants in the Structural Proteome , 2016, Human mutation.

[25]  Michael P Snyder,et al.  Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations , 2015, Nature Genetics.

[26]  Kengo Kinoshita,et al.  Distribution of single‐nucleotide variants on protein–protein interaction sites and its relationship with minor allele frequency , 2015, Protein science : a publication of the Protein Society.

[27]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[28]  E. Lander,et al.  Comprehensive assessment of cancer missense mutation clustering in protein structures , 2015, Proceedings of the National Academy of Sciences.

[29]  M. Sternberg,et al.  The Contribution of Missense Mutations in Core and Rim Residues of Protein–Protein Interfaces to Human Disease , 2015, Journal of molecular biology.

[30]  J. Skolnick,et al.  Insights into Disease-Associated Mutations in the Human Proteome through Protein Structural Analysis. , 2015, Structure.

[31]  Tugba G. Kucukkal,et al.  Structural and physico-chemical effects of disease and non-disease nsSNPs on proteins. , 2015, Current opinion in structural biology.

[32]  Gary D Bader,et al.  MIMP: predicting the impact of mutations on kinase-substrate phosphorylation , 2015, Nature Methods.

[33]  Tugba G. Kucukkal,et al.  On Human Disease‐Causing Amino Acid Variants: Statistical Study of Sequence and Structural Patterns , 2015, Human mutation.

[34]  István A. Kovács,et al.  Widespread Macromolecular Interaction Perturbations in Human Genetic Disorders , 2015, Cell.

[35]  Gary D Bader,et al.  Evolutionary Constraint and Disease Associations of Post-Translational Modification Sites in Human Genomes , 2015, PLoS genetics.

[36]  Bin Zhang,et al.  PhosphoSitePlus, 2014: mutations, PTMs and recalibrations , 2014, Nucleic Acids Res..

[37]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[38]  Jana Marie Schwarz,et al.  MutationTaster2: mutation prediction for the deep-sequencing age , 2014, Nature Methods.

[39]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[40]  Shannon K. Stefl,et al.  Molecular mechanisms of disease-causing missense mutations. , 2013, Journal of molecular biology.

[41]  P. Stenson,et al.  The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine , 2013, Human Genetics.

[42]  Gustavo Glusman,et al.  Clinical applications of sequencing take center stage , 2013, Genome Biology.

[43]  Maria Jesus Martin,et al.  SIFTS: Structure Integration with Function, Taxonomy and Sequences resource , 2012, Nucleic Acids Res..

[44]  M. Sternberg,et al.  Protein–protein interaction sites are hot spots for disease‐associated nonsynonymous SNPs , 2012, Human mutation.

[45]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[46]  M. Vihinen,et al.  Performance of mutation pathogenicity prediction methods on missense variants , 2011, Human mutation.

[47]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[48]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[49]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[50]  V. McKusick Mendelian Inheritance in Man and Its Online Version, OMIM , 2007, The American Journal of Human Genetics.

[51]  Anton Meinhart,et al.  A structural perspective of CTD function. , 2005, Genes & development.

[52]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[53]  J. Thornton,et al.  Molecular basis of inherited diseases: a structural perspective. , 2003, Trends in genetics : TIG.

[54]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[55]  P. Stenson,et al.  Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[56]  S. Al-Karadaghi,et al.  Occurrence, conformational features and amino acid propensities for the pi-helix. , 2002, Protein engineering.

[57]  T. Weaver The π‐helix translates structure into function , 2008, Protein science : a publication of the Protein Society.

[58]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[59]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.