TIVAN: tissue-specific cis-eQTL single nucleotide variant annotation and prediction

SUMMARY Predicting genetic regulatory variants, most of which locate in non-coding genomic regions, still remain a challenge in genetic research. Among all non-coding regulatory variants, cis-eQTL single nucleotide variants (SNVs) are of particular interest for their crucial role in regulating gene expression. Since different gene expression patterns are believed to contribute to the etiologies of different phenotypes, it is desirable to characterize the impact of cis-eQTL SNVs in a context-specific manner. Though computational methods for predicting the potential of variants being pathogenic or deleterious are well-established, methods for annotating and predicting cis-eQTL SNVs are under-developed. Here, we present TIVAN (TIssue-specific Variant ANnotation and prediction), an ensemble method of decision trees, to predict tissue-specific cis-eQTL SNVs. TIVAN is trained based on a comprehensive collection of features, including genome-wide genomic and epigenomic profiling data. As a result, TIVAN has been shown to accurately discriminate cis-eQTL SNVs from non-eQTL SNVs and perform favorably to other methods by obtaining higher five-fold cross-validation AUC values (CV-AUC) and Leave-One-Chromosome-Out predicted AUC values (LOCO-AUC) across 44 different tissues belonging to 27 different tissue classes. Finally, TIVAN consistently maintains top performance on an independent testing dataset, which includes 7 tissues in 11 studies. AVAILABILITY AND IMPLEMENTATION TIVAN software is available at https://github.com/lichen-lab/TIVAN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  J. Buxbaum,et al.  A SPECTRAL APPROACH INTEGRATING FUNCTIONAL GENOMIC ANNOTATIONS FOR CODING AND NONCODING VARIANTS , 2015, Nature Genetics.

[2]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[3]  Pak Chung Sham,et al.  cepip: context-dependent epigenomic weighting for prioritization of regulatory variants and disease-associated genes , 2017, Genome Biology.

[4]  Ivan Ovcharenko,et al.  rVISTA 2.0: evolutionary analysis of transcription factor binding sites , 2004, Nucleic Acids Res..

[5]  Kei-Hoi Cheung,et al.  A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data , 2015, Scientific Reports.

[6]  Qian Wang,et al.  Integrative Tissue-Specific Functional Annotations in the Human Genome Provide Novel Insights on Many Complex Traits and Improve Signal Prioritization in Genome Wide Association Studies , 2015, bioRxiv.

[7]  Masato Kimura,et al.  NCBI’s Database of Genotypes and Phenotypes: dbGaP , 2013, Nucleic Acids Res..

[8]  Feng Xu,et al.  Predicting regulatory variants with composite statistic , 2016, Bioinform..

[9]  Mikhail Pachkov,et al.  SwissRegulon, a database of genome-wide annotations of regulatory sites: recent updates , 2012, Nucleic Acids Res..

[10]  Christopher D. Brown,et al.  Integrative Modeling of eQTLs and Cis-Regulatory Elements Suggests Mechanisms Underlying Cell Type Specificity of eQTLs , 2012, PLoS genetics.

[11]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[12]  Evgeny M. Zdobnov,et al.  CEGA—a catalog of conserved elements from genomic alignments , 2015, Nucleic Acids Res..

[13]  Colin Campbell,et al.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation , 2015, Bioinform..

[14]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[15]  Gabor T. Marth,et al.  Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics , 2013, Science.

[16]  Zhaohui S. Qin,et al.  Using DIVAN to assess disease/trait-associated single nucleotide variants in genome-wide scale , 2017, BMC Research Notes.

[17]  Xiaohui Xie,et al.  DANN: a deep learning approach for annotating the pathogenicity of genetic variants , 2015, Bioinform..

[18]  Kevin Y. Yip,et al.  FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer , 2014, Genome Biology.

[19]  Helen E. Parkinson,et al.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog) , 2016, Nucleic Acids Res..

[20]  P. Stenson,et al.  The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine , 2013, Human Genetics.

[21]  Zhaohui S. Qin,et al.  DIVAN: accurate identification of non-coding disease-specific risk variants using multi-omics profiles , 2016, Genome Biology.

[22]  Piero Carninci,et al.  The FANTOM5 collection, a data series underpinning mammalian transcriptome atlases in diverse cell types , 2017, Scientific Data.

[23]  E. Zeggini,et al.  Functional annotation of non-coding sequence variants , 2014, Nature Methods.