Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations

To deal with the huge number of novel protein-coding variants identified by genome and exome sequencing studies, many computational variant effect predictors (VEPs) have been developed. Such predictors are often trained and evaluated using different variant datasets, making a direct comparison between VEPs difficult. In this study, we use 31 previously published deep mutational scanning (DMS) experiments, which provide quantitative, independent phenotypic measurements for large numbers of single amino acid substitutions, in order to benchmark and compare 46 different VEPs. We also evaluate the ability of DMS measurements and VEPs to discriminate between pathogenic and benign missense variants. We find that DMS experiments tend to be superior to the top-ranking predictors, demonstrating the tremendous potential of DMS for identifying novel human disease mutations. Among the VEPs, DeepSequence clearly stood out, showing both the strongest correlations with DMS data and having the best ability to predict pathogenic mutations, which is especially remarkable given that it is an unsupervised method. We further recommend SNAP2, DEOGEN2, SNPs&GO and REVEL based upon their performance in these analyses.

[1]  H. Carter,et al.  Identifying Mendelian disease genes with the Variant Effect Scoring Tool , 2013, BMC Genomics.

[2]  Xiaohui Xie,et al.  Identifying novel constrained elements by exploiting biased substitution patterns , 2009, Bioinform..

[3]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[4]  Gill Bejerano,et al.  M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity , 2016, Nature Genetics.

[5]  N. Singh,et al.  Deep Mutational Scans as a Guide to Engineering High Affinity T Cell Receptor Interactions with Peptide-bound Major Histocompatibility Complex* , 2016, The Journal of Biological Chemistry.

[6]  J. Kitzman,et al.  Massively Parallel Single Amino Acid Mutagenesis , 2014, Nature Methods.

[7]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[8]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Xiaotu Ma,et al.  Analysis of error profiles in deep next-generation sequencing data , 2019, Genome Biology.

[10]  Justin C. Fay,et al.  Identification of deleterious mutations within three human genomes. , 2009, Genome research.

[11]  R. Gibbs,et al.  Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. , 2015, Human molecular genetics.

[12]  Tom R. Gaunt,et al.  Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models , 2012, Human mutation.

[13]  Raghavan Varadarajan,et al.  Protein model discrimination using mutational sensitivity derived from deep sequencing. , 2012, Structure.

[14]  Debora S Marks,et al.  Deep generative models of genetic variation capture the effects of mutations , 2018, Nature Methods.

[15]  Abhishek Niroula,et al.  Predicting Severity of Disease‐Causing Variants , 2017, Human mutation.

[16]  Yuedong Yang,et al.  Performance of in silico tools for the evaluation of p16INK4a (CDKN2A) variants in CAGI , 2017, Human mutation.

[17]  Mauno Vihinen,et al.  Representativeness of variation benchmark datasets , 2018, BMC Bioinformatics.

[18]  Jing Hu,et al.  SIFT web server: predicting effects of amino acid substitutions on proteins , 2012, Nucleic Acids Res..

[19]  Olivier Lichtarge,et al.  Benchmarking predictions of allostery in liver pyruvate kinase in CAGI4 , 2017, Human mutation.

[20]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[21]  Russ B. Altman,et al.  Improving the prediction of disease-related variants using protein three-dimensional structure , 2011, BMC Bioinformatics.

[22]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..

[23]  Karsten M. Borgwardt,et al.  The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity , 2015, Human mutation.

[24]  David Haussler,et al.  Phylogenetic Hidden Markov Models , 2005 .

[25]  Tom R. Gaunt,et al.  FATHMM-XF: accurate prediction of pathogenic point mutations via extended features , 2017, Bioinform..

[26]  A. Gonzalez-Perez,et al.  Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. , 2011, American journal of human genetics.

[27]  J. Marsh,et al.  The role of protein complexes in human genetic disease , 2019, Protein science : a publication of the Protein Society.

[28]  Jay Shendure,et al.  Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data. , 2017, Cell systems.

[29]  Adam S Dingens,et al.  Experimental Estimation of the Effects of All Amino-Acid Mutations to HIV’s Envelope Protein on Viral Replication in Cell Culture , 2016, PLoS pathogens.

[30]  Colin Campbell,et al.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation , 2015, Bioinform..

[31]  Tom Lenaerts,et al.  DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins , 2017, Nucleic Acids Res..

[32]  Kelly M. Thayer,et al.  Analyses of the effects of all ubiquitin point mutants on yeast growth rate. , 2013, Journal of molecular biology.

[33]  B. Rost,et al.  Better prediction of functional effects for sequence variants , 2015, BMC Genomics.

[34]  S. Fields,et al.  Deep mutational scanning: a new style of protein science , 2014, Nature Methods.

[35]  Trevor Bedford,et al.  Deep mutational scanning of hemagglutinin helps predict evolutionary fates of human H3N2 influenza variants , 2018, Proceedings of the National Academy of Sciences.

[36]  E. Tobias,et al.  Recurrent heterozygous PAX6 missense variants cause severe bilateral microphthalmia via predictable effects on DNA–protein interaction , 2019, Genetics in Medicine.

[37]  Christian Gilissen,et al.  Diagnostic exome sequencing in persons with severe intellectual disability. , 2012, New England Journal of Medicine.

[38]  Aviad Tsherniak,et al.  Mutational processes shape the landscape of TP53 mutations in human cancer , 2018, Nature Genetics.

[39]  J. Thornton,et al.  Molecular basis of inherited diseases: a structural perspective. , 2003, Trends in genetics : TIG.

[40]  Sonu Kumar,et al.  The G protein-coupled receptors in the pufferfish Takifugu rubripes , 2011, BMC Bioinformatics.

[41]  D. Cacchiarelli,et al.  Phenotypic Characterization of a Comprehensive Set of MAPK1/ERK2 Missense Mutants. , 2016, Cell reports.

[42]  R. Altman,et al.  WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation , 2013, BMC Genomics.

[43]  Riccardo Bellazzi,et al.  PaPI: pseudo amino acid composition to score human protein-coding variants , 2015, BMC Bioinformatics.

[44]  Steven E Brenner,et al.  Reports from CAGI: The Critical Assessment of Genome Interpretation , 2017, Human mutation.

[45]  Tomas W. Fitzgerald,et al.  Large-scale discovery of novel genetic causes of developmental disorders , 2014, Nature.

[46]  Thomas Meitinger,et al.  Calmodulin Mutations Associated With Recurrent Cardiac Arrest in Infants , 2013, Circulation.

[47]  J. Poulain,et al.  Capturing the mutational landscape of the beta-lactamase TEM-1 , 2013, Proceedings of the National Academy of Sciences.

[48]  A. Siepel,et al.  Probabilities of Fitness Consequences for Point Mutations Across the Human Genome , 2014, Nature Genetics.

[49]  Jörg Hakenberg,et al.  Predicting the clinical impact of human mutation with deep neural networks , 2018, Nature Genetics.

[50]  E. Boerwinkle,et al.  dbNSFP v3.0: A One‐Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice‐Site SNVs , 2016, Human mutation.

[51]  Serafim Batzoglou,et al.  Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ , 2010, PLoS Comput. Biol..

[52]  R. Sun,et al.  Functional Constraint Profiling of a Viral Protein Reveals Discordance of Evolutionary Conservation and Functionality , 2015, PLoS genetics.

[53]  Rohan Dandage,et al.  Differential strengths of molecular determinants guide environment specific mutational fates , 2018, PLoS genetics.

[54]  Atina G. Coté,et al.  A framework for exhaustively mapping functional missense variants , 2017, Molecular systems biology.

[55]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.

[56]  H. Prokisch,et al.  Expanding the clinical and molecular spectrum of thiamine pyrophosphokinase deficiency: a treatable neurological disorder caused by TPK1 mutations. , 2014, Molecular genetics and metabolism.

[57]  Deanna M. Church,et al.  ClinVar: public archive of relationships among sequence variation and human phenotype , 2013, Nucleic Acids Res..

[58]  Emidio Capriotti,et al.  Bioinformatics Original Paper Predicting the Insurgence of Human Genetic Diseases Associated to Single Point Protein Mutations with Support Vector Machines and Evolutionary Information , 2022 .

[59]  L. Starita,et al.  Massively Parallel Functional Analysis of BRCA1 RING Domain Variants , 2017, Genetics.

[60]  D. Horn,et al.  Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study , 2012, The Lancet.

[61]  Evan T. Geller,et al.  Patterns and rates of exonic de novo mutations in autism spectrum disorders , 2012, Nature.

[62]  Jeffrey M. Spencer,et al.  Deep mutational scanning of S. pyogenes Cas9 reveals important functional domains , 2017, Scientific Reports.

[63]  Benjamin P. Roscoe,et al.  Systematic exploration of ubiquitin sequence, E1 activation efficiency, and experimental fitness in yeast. , 2014, Journal of molecular biology.

[64]  M. Daly,et al.  Regional missense constraint improves variant deleteriousness prediction , 2017, bioRxiv.

[65]  Zhichun Feng,et al.  Identification of two novel TPK1 gene mutations in a Chinese patient with thiamine pyrophosphokinase deficiency undergoing whole exome sequencing , 2019, Journal of pediatric endocrinology & metabolism : JPEM.

[66]  Michael R. Johnson,et al.  De novo mutations in the classic epileptic encephalopathies , 2013, Nature.

[67]  David L. Young,et al.  Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein , 2013, RNA.

[68]  Eric D. Kelsic,et al.  RNA Structural Determinants of Optimal Codons Revealed by MAGE-Seq. , 2016, Cell systems.

[69]  W. Chung,et al.  MVP: predicting pathogenicity of missense variants by deep learning , 2018, bioRxiv.

[70]  Inês Barroso,et al.  Prospective functional classification of all possible missense variants in PPARG , 2016, Nature Genetics.

[71]  Michael B. Doud,et al.  Accurate Measurement of the Effects of All Amino-Acid Mutations on Influenza Hemagglutinin , 2016, Viruses.

[72]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[73]  Vincent J. Henry,et al.  OMICtools: an informative directory for multi-omic data analysis , 2014, Database J. Biol. Databases Curation.

[74]  Daniel J. Park,et al.  Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics , 2017, Human Genomics.

[75]  R. Ranganathan,et al.  Evolvability as a Function of Purifying Selection in TEM-1 β-Lactamase , 2015, Cell.

[76]  Kara Dolinski,et al.  An extended set of yeast-based functional assays accurately identifies human disease mutations , 2016, Genome research.

[77]  M. Vihinen,et al.  PON-P2: Prediction Method for Fast and Reliable Identification of Harmful Variants , 2015, PloS one.

[78]  Ryan L. Collins,et al.  Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes , 2019, bioRxiv.

[79]  M. Nyegaard,et al.  Human Calmodulin Mutations , 2018, Front. Mol. Neurosci..

[80]  M. Sternberg,et al.  SuSPect: Enhanced Prediction of Single Amino Acid Variant (SAV) Phenotype Using Network Features , 2014, Journal of molecular biology.

[81]  Trevor Hastie,et al.  REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. , 2016, American journal of human genetics.

[82]  Ryan L. Collins,et al.  The mutational constraint spectrum quantified from variation in 141,456 humans , 2020, Nature.

[83]  P. Ng,et al.  SIFT missense predictions for genomes , 2015, Nature Protocols.

[84]  S. Brunak,et al.  Prediction of Disease Causing Non-Synonymous SNPs by the Artificial Neural Network Predictor NetDiseaseSNP , 2013, PloS one.

[85]  Vanessa E. Gray,et al.  Multiplex Assessment of Protein Variant Abundance by Massively Parallel Sequencing , 2018, Nature Genetics.

[86]  T. Andrews,et al.  Comparison of predicted and actual consequences of missense mutations , 2015, Proceedings of the National Academy of Sciences.

[87]  Burkhard Rost,et al.  Variant effect predictions capture some aspects of deep mutational scanning experiments , 2019, BMC Bioinformatics.

[88]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[89]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[90]  Lilia M. Iakoucheva,et al.  MutPred2: inferring the molecular and phenotypic impact of amino acid variants , 2017, bioRxiv.

[91]  D. Bolon,et al.  Systematic Mutant Analyses Elucidate General and Client-Specific Aspects of Hsp90 Function. , 2016, Cell reports.

[92]  B. V. van Bon,et al.  Diagnostic exome sequencing in persons with severe intellectual disability. , 2012, The New England journal of medicine.

[93]  Naomi R. Latorraca,et al.  Structural and functional characterization of G protein–coupled receptors with deep mutational scanning , 2019, bioRxiv.

[94]  Taylor L. Mighell,et al.  A saturation mutagenesis approach to understanding PTEN lipid phosphatase activity and genotype-phenotypes relationships , 2018, bioRxiv.

[95]  David L. Young,et al.  Massively Parallel Functional Analysis of BRCA1 RING Domain Variants , 2015, Genetics.

[96]  Jana Marie Schwarz,et al.  MutationTaster2: mutation prediction for the deep-sequencing age , 2014, Nature Methods.

[97]  A. Siepel,et al.  Probabilities of Fitness Consequences for Point Mutations Across the Human Genome , 2014, Nature Genetics.

[98]  A. Chakraborty,et al.  Deconstruction of the Ras switching cycle through saturation mutagenesis , 2017, eLife.

[99]  Kei-Hoi Cheung,et al.  A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data , 2015, Scientific Reports.

[100]  P. Thomas,et al.  Coding single-nucleotide polymorphisms associated with complex vs. Mendelian disease: evolutionary evidence for differences in molecular effects. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[101]  De novo mutations in epileptic encephalopathies , 2013 .

[102]  R. Gibbs,et al.  Deep sequencing of systematic combinatorial libraries reveals β-lactamase sequence constraints at high resolution. , 2012, Journal of molecular biology.

[103]  Ágnes Tóth-Petróczy,et al.  Systematic Mapping of Protein Mutational Space by Prolonged Drift Reveals the Deleterious Effects of Seemingly Neutral Mutations , 2015, PLoS Comput. Biol..

[104]  Gregory M. Cooper,et al.  CADD: predicting the deleteriousness of variants throughout the human genome , 2018, Nucleic Acids Res..

[105]  Mark T. Handley,et al.  A Restricted Repertoire of De Novo Mutations in ITPR1 Cause Gillespie Syndrome with Evidence for Dominant-Negative Effect , 2016, American journal of human genetics.

[106]  J. Miller,et al.  Predicting the Functional Effect of Amino Acid Substitutions and Indels , 2012, PloS one.

[107]  Joseph D. Janizek,et al.  Accurate classification of BRCA1 variants with saturation genome editing , 2018, Nature.