GAVIN: Gene-Aware Variant INterpretation for medical sequencing

We present Gene-Aware Variant INterpretation (GAVIN), a new method that accurately classifies variants for clinical diagnostic purposes. Classifications are based on gene-specific calibrations of allele frequencies from the ExAC database, likely variant impact using SnpEff, and estimated deleteriousness based on CADD scores for >3000 genes. In a benchmark on 18 clinical gene sets, we achieve a sensitivity of 91.4% and a specificity of 76.9%. This accuracy is unmatched by 12 other tools. We provide GAVIN as an online MOLGENIS service to annotate VCF files and as an open source executable for use in bioinformatic pipelines. It can be found at http://molgenis.org/gavin.

[1]  Michael Krawczak,et al.  Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease , 2013, Human Genetics.

[2]  Mauno Vihinen,et al.  VariBench: A Benchmark Database for Variations , 2013, Human mutation.

[3]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[4]  Anh-Dao Nguyen,et al.  Clinical Genomic Database , 2013, Proceedings of the National Academy of Sciences.

[5]  W. Miller,et al.  PhenCode: connecting ENCODE data with mutations and phenotype , 2007, Human mutation.

[6]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[7]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..

[8]  Ludovico Minati,et al.  Slow Breathing and Hypoxic Challenge: Cardiorespiratory Consequences and Their Central Neural Substrates , 2015, PloS one.

[9]  Lluis Quintana-Murci,et al.  The mutation significance cutoff: gene-level thresholds for variant predictions , 2016, Nature Methods.

[10]  J. Lupski,et al.  Non-coding genetic variants in human disease. , 2015, Human molecular genetics.

[11]  Jaroslav Bendl,et al.  PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions , 2016, PLoS Comput. Biol..

[12]  Sheena M. Scroggins,et al.  CADD score has limited clinical validity for the identification of pathogenic variants in non-coding regions in a hereditary cancer panel , 2016, Genetics in Medicine.

[13]  Jana Marie Schwarz,et al.  MutationTaster2: mutation prediction for the deep-sequencing age , 2014, Nature Methods.

[14]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[15]  C. Wijmenga,et al.  Evaluation of CADD Scores in Curated Mismatch Repair Gene Variants Yields a Model for Clinical Validation and Prioritization , 2015, Human mutation.

[16]  M. Vihinen,et al.  Immunodeficiency mutation databases (IDbases). , 1998, Human mutation.

[17]  Jean-Michel Claverie,et al.  The human gene damage index as a gene-level approach to prioritizing exome variants , 2015, Proceedings of the National Academy of Sciences.

[18]  J. Shendure,et al.  Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data , 2011, Nature Reviews Genetics.

[19]  E. Nestler,et al.  Chronic cocaine-regulated epigenomic changes in mouse nucleus accumbens , 2014, Genome Biology.

[20]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[21]  Pieter B. T. Neerincx,et al.  Supplementary Information Whole-genome sequence variation , population structure and demographic history of the Dutch population , 2022 .

[22]  P. Stenson,et al.  The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine , 2013, Human Genetics.

[23]  A. Gonzalez-Perez,et al.  Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. , 2011, American journal of human genetics.

[24]  Colin Campbell,et al.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation , 2015, Bioinform..

[25]  Muin J Khoury,et al.  Deploying whole genome sequencing in clinical practice and public health: Meeting the challenge one bin at a time , 2011, Genetics in Medicine.

[26]  Kevin Y. Yip,et al.  FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer , 2014, Genome Biology.

[27]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[28]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[29]  M. Vihinen,et al.  PON-P2: Prediction Method for Fast and Reliable Identification of Harmful Variants , 2015, PloS one.

[30]  J. Miller,et al.  Predicting the Functional Effect of Amino Acid Substitutions and Indels , 2012, PloS one.

[31]  Peng Cui,et al.  Dynamic regulation of genome-wide pre-mRNA splicing and stress tolerance by the Sm-like protein LSm5 in Arabidopsis , 2014, Genome Biology.

[32]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[33]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[34]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.

[35]  S. Letovsky,et al.  Exploring the landscape of pathogenic genetic variation in the ExAC population database: insights of relevance to variant classification , 2015, Genetics in Medicine.

[36]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[37]  Morris A. Swertz,et al.  The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button , 2010, BMC Bioinformatics.