FIRE: functional inference of genetic variants that regulate gene expression

Motivation: Interpreting genetic variation in noncoding regions of the genome is an important challenge for personal genome analysis. One mechanism by which noncoding single nucleotide variants (SNVs) influence downstream phenotypes is through the regulation of gene expression. Methods to predict whether or not individual SNVs are likely to regulate gene expression would aid interpretation of variants of unknown significance identified in whole‐genome sequencing studies. Results: We developed FIRE (Functional Inference of Regulators of Expression), a tool to score both noncoding and coding SNVs based on their potential to regulate the expression levels of nearby genes. FIRE consists of 23 random forests trained to recognize SNVs in cis‐expression quantitative trait loci (cis‐eQTLs) using a set of 92 genomic annotations as predictive features. FIRE scores discriminate cis‐eQTL SNVs from non‐eQTL SNVs in the training set with a cross‐validated area under the receiver operating characteristic curve (AUC) of 0.807, and discriminate cis‐eQTL SNVs shared across six populations of different ancestry from non‐eQTL SNVs with an AUC of 0.939. FIRE scores are also predictive of cis‐eQTL SNVs across a variety of tissue types. Availability and implementation: FIRE scores for genome‐wide SNVs in hg19/GRCh37 are available for download at https://sites.google.com/site/fireregulatoryvariation/. Contact: nilah@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Kei-Hoi Cheung,et al.  A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data , 2015 .

[2]  Mauno Vihinen,et al.  Systematics for types and effects of DNA variations , 2018, BMC Genomics.

[3]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[4]  Joseph K. Pickrell,et al.  DNaseI sensitivity QTLs are a major determinant of human expression variation , 2011, Nature.

[5]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[6]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[7]  Colin Campbell,et al.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation , 2015, Bioinform..

[8]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[9]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[10]  P. Deloukas,et al.  Patterns of Cis Regulatory Variation in Diverse Human Populations , 2012, PLoS genetics.

[11]  E. Zeggini,et al.  Functional annotation of non-coding sequence variants , 2014, Nature Methods.

[12]  P. Green,et al.  Widespread Genomic Signatures of Natural Selection in Hominid Evolution , 2009, PLoS genetics.

[13]  Ryan D. Hernandez,et al.  Population Genetics of Rare Variants and Complex Diseases , 2013, Human Heredity.

[14]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[15]  Benjamin J. Strober,et al.  A method to predict the impact of regulatory variants from DNA sequence , 2015, Nature Genetics.

[16]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[17]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  William Stafford Noble,et al.  Integrative annotation of chromatin elements from ENCODE data , 2012, Nucleic acids research.

[20]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[21]  Predrag Radivojac,et al.  Automated inference of molecular mechanisms of disease from amino acid substitutions , 2009, Bioinform..

[22]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[23]  D. Schaid,et al.  From genome-wide associations to candidate causal variants by statistical fine-mapping , 2018, Nature Reviews Genetics.

[24]  Pak Chung Sham,et al.  cepip: context-dependent epigenomic weighting for prioritization of regulatory variants and disease-associated genes , 2017, Genome Biology.

[25]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[26]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[27]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[28]  Christopher D. Brown,et al.  Integrative Modeling of eQTLs and Cis-Regulatory Elements Suggests Mechanisms Underlying Cell Type Specificity of eQTLs , 2012, PLoS genetics.

[29]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[30]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[31]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[32]  Matthew Stephens,et al.  Dissecting the regulatory architecture of gene expression QTLs , 2012, Genome Biology.

[33]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[34]  David J. Adams,et al.  Contributions of Protein-Coding and Regulatory Change to Adaptive Molecular Evolution in Murid Rodents , 2013, PLoS genetics.

[35]  P. Stenson,et al.  The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine , 2013, Human Genetics.

[36]  Gill Bejerano,et al.  M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity , 2016, Nature Genetics.

[37]  Eleazar Eskin,et al.  Colocalization of GWAS and eQTL Signals Detects Target Genes , 2016 .

[38]  A. Siepel,et al.  Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data , 2016, Nature Genetics.

[39]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[40]  Trevor Hastie,et al.  REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. , 2016, American journal of human genetics.

[41]  D. Koller,et al.  Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals , 2013, Genome research.

[42]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[43]  Saurabh Baheti,et al.  Comprehensively evaluating cis-regulatory variation in the human prostate transcriptome by using gene-level allele-specific expression. , 2015, American journal of human genetics.

[44]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..

[45]  Kevin Y. Yip,et al.  FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer , 2014, Genome Biology.

[46]  Feng Xu,et al.  Predicting regulatory variants with composite statistic , 2016, Bioinform..

[47]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..