Inferring the molecular and phenotypic impact of amino acid variants with MutPred2

Identifying pathogenic variants and underlying functional alterations is challenging. To this end, we introduce MutPred2, a tool that improves the prioritization of pathogenic amino acid substitutions over existing methods, generates molecular mechanisms potentially causative of disease, and returns interpretable pathogenicity score distributions on individual genomes. Whilst its prioritization performance is state-of-the-art, a distinguishing feature of MutPred2 is the probabilistic modeling of variant impact on specific aspects of protein structure and function that can serve to guide experimental studies of phenotype-altering variants. We demonstrate the utility of MutPred2 in the identification of the structural and functional mutational signatures relevant to Mendelian disorders and the prioritization of de novo mutations associated with complex neurodevelopmental disorders. We then experimentally validate the functional impact of several variants identified in patients with such disorders. We argue that mechanism-driven studies of human inherited disease have the potential to significantly accelerate the discovery of clinically actionable variants.

[1]  H. Besedovsky,et al.  Network of immune-neuroendocrine interactions. , 1977, Clinical and experimental immunology.

[2]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[3]  A. Kondrashov Contamination of the genome by very slightly deleterious mutations: why have we not died 100 times over? , 1995, Journal of theoretical biology.

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[6]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[7]  Marco Saerens,et al.  Adjusting the Outputs of a Classifier to New a Priori Probabilities May Significantly Improve Classification Accuracy: Evidence from a multi-class problem in remote sensing , 2001, ICML.

[8]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[9]  Justin C. Fay,et al.  Positive and negative selection on the human genome. , 2001, Genetics.

[10]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[11]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[12]  D. Chasman,et al.  Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. , 2001, Journal of molecular biology.

[13]  A. Christopoulos Allosteric binding sites on cell-surface receptors: novel targets for drug discovery , 2002, Nature Reviews Drug Discovery.

[14]  S. Henikoff,et al.  Accounting for human polymorphisms predicted to affect protein function. , 2002, Genome research.

[15]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  Rémi Gilleron,et al.  Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..

[18]  Sofia Khan,et al.  Spectrum of disease-causing mutations in protein secondary structures , 2007, BMC Structural Biology.

[19]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[20]  Christopher J. Oldfield,et al.  Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. , 2007, Journal of proteome research.

[21]  L. Pease,et al.  Gene splicing and mutagenesis by PCR-driven overlap extension , 2007, Nature Protocols.

[22]  D. Vitkup,et al.  Role of Duplicate Genes in Robustness against Deleterious Human Mutations , 2008, PLoS genetics.

[23]  Brandi A. Thompson,et al.  CHD8 Is an ATP-Dependent Chromatin Remodeling Factor That Regulates β-Catenin Target Genes , 2008, Molecular and Cellular Biology.

[24]  Ryan D. Hernandez,et al.  Proportionally more deleterious genetic variation in European than in African populations , 2008, Nature.

[25]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[26]  M. Vidal,et al.  Edgetic perturbation models of human inherited disorders , 2009, Molecular systems biology.

[27]  Predrag Radivojac,et al.  Automated inference of molecular mechanisms of disease from amino acid substitutions , 2009, Bioinform..

[28]  E. Capriotti,et al.  Functional annotations improve the predictive score of human disease‐related mutations in proteins , 2009, Human mutation.

[29]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[30]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[31]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[32]  Anaïs Mottaz,et al.  Bioinformatics Applications Note Databases and Ontologies Easy Retrieval of Single Amino-acid Polymorphisms and Phenotype Information Using Swissvar , 2022 .

[33]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[34]  J. Shendure,et al.  Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data , 2011, Nature Reviews Genetics.

[35]  M. Vihinen,et al.  Performance of mutation pathogenicity prediction methods on missense variants , 2011, Human mutation.

[36]  M. Rieder,et al.  Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations , 2011, Nature Genetics.

[37]  Diana V. Dugas,et al.  Protein Interactome Reveals Converging Molecular Pathways Among Autism Disorders , 2011, Science Translational Medicine.

[38]  S. Lok,et al.  Increased exonic de novo mutation rate in individuals with schizophrenia , 2011, Nature Genetics.

[39]  S. Levy,et al.  Exome sequencing supports a de novo mutational paradigm for schizophrenia , 2011, Nature Genetics.

[40]  B. V. van Bon,et al.  Diagnostic exome sequencing in persons with severe intellectual disability. , 2012, The New England journal of medicine.

[41]  Kenny Q. Ye,et al.  De Novo Gene Disruptions in Children on the Autistic Spectrum , 2012, Neuron.

[42]  Michael F. Walker,et al.  De novo mutations revealed by whole-exome sequencing are strongly associated with autism , 2012, Nature.

[43]  S. Steinberg,et al.  Rate of de novo mutations and the importance of father’s age to disease risk , 2012, Nature.

[44]  C. Tyler-Smith,et al.  Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. , 2012, American journal of human genetics.

[45]  D. Horn,et al.  Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study , 2012, The Lancet.

[46]  Bradley P. Coe,et al.  Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations , 2012, Nature.

[47]  Bradley P. Coe,et al.  Multiplex Targeted Sequencing Identifies Recurrently Mutated Genes in Autism Spectrum Disorders , 2012, Science.

[48]  K. Roeder,et al.  The Autism Sequencing Consortium: Large-Scale, High-Throughput Sequencing in Autism Spectrum Disorders , 2012, Neuron.

[49]  S. Levy,et al.  De novo gene mutations highlight patterns of genetic and neural complexity in schizophrenia , 2012, Nature Genetics.

[50]  Vanessa E. Gray,et al.  Evolutionary diagnosis method for variants in personal exomes , 2012, Nature Methods.

[51]  Evan T. Geller,et al.  Patterns and rates of exonic de novo mutations in autism spectrum disorders , 2012, Nature.

[52]  Tom R. Gaunt,et al.  Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models , 2012, Human mutation.

[53]  Thomas A. Peterson,et al.  Towards precision medicine: advances in computational approaches for the analysis of human variants. , 2013, Journal of molecular biology.

[54]  H. Carter,et al.  Identifying Mendelian disease genes with the Variant Effect Scoring Tool , 2013, BMC Genomics.

[55]  M. Vidal,et al.  Edgotype: a fundamental link between genotype and phenotype. , 2013, Current opinion in genetics & development.

[56]  Xing-Ming Zhao,et al.  Human Monogenic Disease Genes Have Frequently Functionally Redundant Paralogs , 2013, PLoS Comput. Biol..

[57]  M. Mann,et al.  Status of Large-scale Analysis of Post-translational Modifications by Mass Spectrometry* , 2013, Molecular & Cellular Proteomics.

[58]  L. Siever,et al.  Spatial and Temporal Mapping of De Novo Mutations in Schizophrenia to a Fetal Prefrontal Cortical Network , 2013, Cell.

[59]  Patricia C. Babbitt,et al.  Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space , 2013, PLoS Comput. Biol..

[60]  Michael R. Johnson,et al.  De novo mutations in the classic epileptic encephalopathies , 2013, Nature.

[61]  S. Scherer,et al.  Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing. , 2013, American journal of human genetics.

[62]  Gilles Blanchard,et al.  Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[63]  Christopher S. Poultney,et al.  Synaptic, transcriptional, and chromatin genes disrupted in autism , 2014, Nature.

[64]  Cody J. Wenthur,et al.  Drugs for allosteric sites on receptors. , 2014, Annual review of pharmacology and toxicology.

[65]  Pieter B. T. Neerincx,et al.  Supplementary Information Whole-genome sequence variation , population structure and demographic history of the Dutch population , 2022 .

[66]  Boris Yamrom,et al.  The contribution of de novo coding mutations to autism spectrum disorder , 2014, Nature.

[67]  David Haussler,et al.  The UCSC Genome Browser database: 2014 update , 2013, Nucleic Acids Res..

[68]  Seungtai Yoon,et al.  De novo Mutations in Schizophrenia Implicate Chromatin Remodeling and Support a Genetic Overlap with Autism and Intellectual Disability , 2014, Molecular Psychiatry.

[69]  C. Lim,et al.  Competition among metal ions for protein binding sites: determinants of metal ion selectivity in proteins. , 2014, Chemical reviews.

[70]  L. Vissers,et al.  Genome sequencing identifies major causes of severe intellectual disability , 2014, Nature.

[71]  S. Horvath,et al.  Protein interaction network of alternatively spliced isoforms from brain links genetic risk factors for autism , 2014, Nature Communications.

[72]  E. Banks,et al.  De novo mutations in schizophrenia implicate synaptic networks , 2014, Nature.

[73]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[74]  Haiyuan Yu,et al.  Elucidating Common Structural Features of Human Pathogenic Variations Using Large‐Scale Atomic‐Resolution Protein Networks , 2014, Human mutation.

[75]  M. Vihinen Variation Ontology for annotation of variation effects and mechanisms , 2014, Genome research.

[76]  Bridget E. Begg,et al.  A Proteome-Scale Map of the Human Interactome Network , 2014, Cell.

[77]  Jana Marie Schwarz,et al.  MutationTaster2: mutation prediction for the deep-sequencing age , 2014, Nature Methods.

[78]  D. Rujescu,et al.  Exome Sequencing in 53 Sporadic Cases of Schizophrenia Identifies 18 Putative Candidate Genes , 2014, PloS one.

[79]  István A. Kovács,et al.  Widespread Macromolecular Interaction Perturbations in Human Genetic Disorders , 2015, Cell.

[80]  S. Scherer,et al.  Whole-genome sequencing of quartet families with autism spectrum disorder , 2015, Nature Medicine.

[81]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[82]  D. Geschwind,et al.  Gene hunting in autism spectrum disorder: on the path to precision medicine , 2015, The Lancet Neurology.

[83]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[84]  Cheng Soon Ong,et al.  Learning from Corrupted Binary Labels via Class-Probability Estimation , 2015, ICML.

[85]  T. Andrews,et al.  Comparison of predicted and actual consequences of missense mutations , 2015, Proceedings of the National Academy of Sciences.

[86]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[87]  J. Sebat,et al.  Spatiotemporal 16p11.2 Protein Network Implicates Cortical Late Mid-Fetal Brain Development and KCTD13-Cul3-RhoA Pathway in Psychiatric Diseases , 2015, Neuron.

[88]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[89]  J. Sebat,et al.  From de novo mutations to personalized therapeutic interventions in autism. , 2015, Annual review of medicine.

[90]  T. Südhof,et al.  Analysis of conditional heterozygous STXBP1 mutations in human neurons. , 2015, The Journal of clinical investigation.

[91]  R. Gibbs,et al.  Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. , 2015, Human molecular genetics.

[92]  Tomas W. Fitzgerald,et al.  Large-scale discovery of novel genetic causes of developmental disorders , 2014, Nature.

[93]  M. Vihinen,et al.  Variation Interpretation Predictors: Principles, Types, Performance, and Choice , 2016, Human mutation.

[94]  Samuel S. Gross,et al.  Genome-wide characteristics of de novo mutations in autism , 2016, npj Genomic Medicine.

[95]  M. Fukunaga,et al.  Whole-exome sequencing and neurite outgrowth analysis in autism spectrum disorder , 2015, Journal of Human Genetics.

[96]  M. Diekhans,et al.  The ORFeome Collaboration: a genome-scale human ORF-clone resource , 2016, Nature Methods.

[97]  C. Baker,et al.  Genome Sequencing of Autism-Affected Families Reveals Disruption of Putative Noncoding Regulatory DNA. , 2016, American journal of human genetics.

[98]  Martha White,et al.  Estimating the class prior and posterior from noisy positives and unlabeled data , 2016, NIPS.

[99]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[100]  B. Rost,et al.  Protein function in precision medicine: deep understanding with machine learning , 2016, FEBS letters.

[101]  E. Boerwinkle,et al.  dbNSFP v3.0: A One‐Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice‐Site SNVs , 2016, Human mutation.

[102]  Martha White,et al.  Recovering True Classifier Performance in Positive-Unlabeled Learning , 2017, AAAI.

[103]  Michael R. Johnson,et al.  De novo mutations in synaptic transmission genes including DNM1 cause epileptic encephalopathies. , 2014, American journal of human genetics.

[104]  Chunlei Liu,et al.  ClinVar: improving access to variant interpretations and supporting evidence , 2017, Nucleic Acids Res..

[105]  J. Sebat,et al.  Getting to the Cores of Autism , 2019, Cell.

[106]  Kara Dolinski,et al.  The BioGRID interaction database: 2019 update , 2018, Nucleic Acids Res..

[107]  Irina M. Armean,et al.  The mutational constraint spectrum quantified from variation in 141,456 humans , 2019, Nature.

[108]  P. Stenson,et al.  The Human Gene Mutation Database (HGMD®): optimizing its use in a clinical diagnostic or research setting , 2020, Human Genetics.