A Bayesian framework for efficient and accurate variant prediction

There is a growing need to develop variant prediction tools capable of assessing a wide spectrum of evidence. We present a Bayesian framework that involves aggregating pathogenicity data across multiple in silico scores on a gene-by-gene basis and multiple evidence statistics in both quantitative and qualitative forms, and performs 5-tiered variant classification based on the resulting probability credible interval. When evaluated in 1,161 missense variants, our gene-specific in silico model-based meta-predictor yielded an area under the curve (AUC) of 96.0% and outperformed all other in silico predictors. Multifactorial model analysis incorporating all available evidence yielded 99.7% AUC, with 22.8% predicted as variants of uncertain significance (VUS). Use of only 3 auto-computed evidence statistics yielded 98.6% AUC with 56.0% predicted as VUS, which represented sufficient accuracy to rapidly assign a significant portion of VUS to clinically meaningful classifications. Collectively, our findings support the use of this framework to conduct large-scale variant prioritization using in silico predictors followed by variant prediction and classification with a high degree of predictive accuracy.

[1]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[2]  R. Gibbs,et al.  Gene-Specific Function Prediction for Non-Synonymous Mutations in Monogenic Diabetes Genes , 2014, PloS one.

[3]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[4]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[5]  Hee Min Choi,et al.  The Polya-Gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic , 2013 .

[6]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[7]  Steven Gallinger,et al.  A Multifactorial Likelihood Model for MMR Gene Variant Classification Incorporating Probabilities Based on Sequence Bioinformatics and Tumor Characteristics: A Report from the Colon Cancer Family Registry , 2013, Human mutation.

[8]  Trevor Hastie,et al.  REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. , 2016, American journal of human genetics.

[9]  B. Feng,et al.  PERCH: A Unified Framework for Disease Gene Prioritization , 2017, Human mutation.

[10]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.

[11]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[12]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[13]  Jae-Hwan Jhong,et al.  Erratum to: Meta-analytic support vector machine for integrating multiple omics data , 2017, BioData Mining.

[14]  S. Farber-Katz,et al.  Beyond DNA: An Integrated and Functional Approach for Classifying Germline Variants in Breast Cancer Genes , 2016, International journal of breast cancer.

[15]  J. Miller,et al.  Predicting the Functional Effect of Amino Acid Substitutions and Indels , 2012, PloS one.

[16]  Justin C. Fay,et al.  Identification of deleterious mutations within three human genomes. , 2009, Genome research.

[17]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[18]  K. Pollard,et al.  Detection of nonneutral substitution rates on mammalian phylogenies. , 2010, Genome research.

[19]  Rodney J Scott,et al.  Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database , 2013, Nature Genetics.

[20]  S. Batzoglou,et al.  Distribution and intensity of constraint in mammalian genomic sequence. , 2005, Genome research.

[21]  Raymond M. Moore,et al.  Evaluation of ACMG-Guideline-Based Variant Classification of Cancer Susceptibility and Non-Cancer-Associated Genes in Families Affected by Breast Cancer. , 2016, American journal of human genetics.

[22]  Meng Wang,et al.  iFish: predicting the pathogenicity of human nonsynonymous variants using gene-specific/family-specific attributes and classifiers , 2016, Scientific Reports.

[23]  Glenn A. Maston,et al.  A Standardized DNA Variant Scoring System for Pathogenicity Assessments in Mendelian Disorders , 2015, Human mutation.

[24]  Joyce A. Mitchell,et al.  Utility of gene-specific algorithms for predicting pathogenicity of uncertain gene variants , 2012, J. Am. Medical Informatics Assoc..

[25]  A. Zharkikh,et al.  Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral , 2005, Journal of Medical Genetics.

[26]  D. Pruss,et al.  Development and validation of a new algorithm for the reclassification of genetic variants identified in the BRCA1 and BRCA2 genes , 2014, Breast Cancer Research and Treatment.

[27]  Predrag Radivojac,et al.  Automated inference of molecular mechanisms of disease from amino acid substitutions , 2009, Bioinform..

[28]  Xiaohui Xie,et al.  Identifying novel constrained elements by exploiting biased substitution patterns , 2009, Bioinform..

[29]  Karim Benkirane,et al.  Comparison of DNA methylation profiles in human fetal and adult red blood cell progenitors , 2015, Genome Medicine.

[30]  Jana Marie Schwarz,et al.  MutationTaster2: mutation prediction for the deep-sequencing age , 2014, Nature Methods.

[31]  David J Balding,et al.  Bayesian models for syndrome- and gene-specific probabilities of novel variant pathogenicity , 2015, Genome Medicine.

[32]  A. Carracedo,et al.  Medical genomics: The intricate path from genetic variant identification to clinical interpretation , 2014, Applied & translational genomics.

[33]  Fergus J Couch,et al.  A review of a multifactorial probability‐based model for classification of BRCA1 and BRCA2 variants of uncertain significance (VUS) , 2012, Human mutation.

[34]  Quan Li,et al.  InterVar: Clinical Interpretation of Genetic Variants by the 2015 ACMG-AMP Guidelines. , 2017, American journal of human genetics.

[35]  Tom R. Gaunt,et al.  Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models , 2012, Human mutation.

[36]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[37]  Ramon Brugada,et al.  Determining the Pathogenicity of Genetic Variants Associated with Cardiac Channelopathies , 2015, Scientific Reports.

[38]  M. Vihinen How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis , 2012, BMC Genomics.

[39]  F. Couch,et al.  Integrated evaluation of DNA sequence variants of unknown clinical significance: application to BRCA1 and BRCA2. , 2004, American journal of human genetics.

[40]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.