ReQTL: identifying correlations between expressed SNVs and gene expression using RNA-sequencing data

Abstract Motivation By testing for associations between DNA genotypes and gene expression levels, expression quantitative trait locus (eQTL) analyses have been instrumental in understanding how thousands of single nucleotide variants (SNVs) may affect gene expression. As compared to DNA genotypes, RNA genetic variation represents a phenotypic trait that reflects the actual allele content of the studied system. RNA genetic variation at expressed SNV loci can be estimated using the proportion of alleles bearing the variant nucleotide (variant allele fraction, VAFRNA). VAFRNA is a continuous measure which allows for precise allele quantitation in loci where the RNA alleles do not scale with the genotype count. We describe a method to correlate VAFRNA with gene expression and assess its ability to identify genetically regulated expression solely from RNA-sequencing (RNA-seq) datasets. Results We introduce ReQTL, an eQTL modification which substitutes the DNA allele count for the variant allele fraction at expressed SNV loci in the transcriptome (VAFRNA). We exemplify the method on sets of RNA-seq data from human tissues obtained though the Genotype-Tissue Expression (GTEx) project and demonstrate that ReQTL analyses are computationally feasible and can identify a subset of expressed eQTL loci. Availability and implementation A toolkit to perform ReQTL analyses is available at https://github.com/HorvathLab/ReQTL. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Yi-Hui Zhou,et al.  Estimation of cis‐eQTL effect sizes using a log of linear model , 2018, Biometrics.

[2]  Vitor R. C. Aguiar,et al.  Mapping Bias Overestimates Reference Allele Frequencies at the HLA Genes in the 1000 Genomes Project Phase I Data , 2014, G3: Genes, Genomes, Genetics.

[3]  L. Kruglyak,et al.  The role of regulatory variation in complex traits and disease , 2015, Nature Reviews Genetics.

[4]  Jeffrey T Leek,et al.  Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown , 2016, Nature Protocols.

[5]  Eli Eisenberg,et al.  A-to-I RNA editing — immune protector and transcriptome diversifier , 2018, Nature Reviews Genetics.

[6]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[7]  Anushya Muruganujan,et al.  Applications for protein sequence–function evolution data: mRNA/protein expression analysis and coding SNP scoring tools , 2006, Nucleic Acids Res..

[8]  Paivi Pajukanta,et al.  ASElux: an ultra‐fast and accurate allelic reads counter , 2018, Bioinform..

[9]  The Genetic Architecture of Gene Expression Levels in Wild Baboons , 2014 .

[10]  Terrence S. Furey,et al.  Novel Distal eQTL Analysis Demonstrates Effect of Population Genetic Architecture on Detecting and Interpreting Associations , 2014, Genetics.

[11]  Yves A. Lussier,et al.  eQTL networks unveil enriched mRNA master integrators downstream of complex disease-associated SNPs , 2015, J. Biomed. Informatics.

[12]  Jonathan K. Pritchard,et al.  WASP: allele-specific software for robust molecular quantitative trait locus discovery , 2015, Nature Methods.

[13]  Jin Billy Li,et al.  Reliable identification of genomic variants from RNA-seq data. , 2013, American journal of human genetics.

[14]  Modifier locus mapping of a transgenic F2 mouse population identifies CCDC115 as a novel aggressive prostate cancer modifier gene in humans , 2018, BMC Genomics.

[15]  Maria Gutierrez-Arcelus,et al.  Allelic mapping bias in RNA-sequencing is not a major confounder in eQTL studies , 2014, Genome Biology.

[16]  Alexei V. Evsikov,et al.  Aligning the Aligners: Comparison of RNA Sequencing Data Alignment and Gene Expression Quantification Tools for Clinical Breast Cancer Research , 2019, Journal of personalized medicine.

[17]  E. Dermitzakis,et al.  Mapping eQTLs with RNA-seq reveals novel susceptibility genes, non-coding RNAs and alternative-splicing events in systemic lupus erythematosus , 2017, Human molecular genetics.

[18]  Eric Banks,et al.  Tools and best practices for data processing in allelic expression analysis , 2015, Genome Biology.

[19]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.

[20]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[21]  C. Vandiedonck Genetic association of molecular traits: A help to identify causative variants in complex diseases , 2018, Clinical genetics.

[22]  Stein Aerts,et al.  Comprehensive Analysis of Transcriptome Variation Uncovers Known and Novel Driver Events in T-Cell Acute Lymphoblastic Leukemia , 2013, PLoS genetics.

[23]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[24]  Virginia Savova,et al.  dbMAE: the database of autosomal monoallelic expression , 2015, Nucleic acids research.

[25]  K. Tsaneva-Atanasova,et al.  RNA2DNAlign: nucleotide resolution allele asymmetries through quantitative assessment of RNA and DNA paired sequencing data , 2016, Nucleic acids research.

[26]  B. Tycko,et al.  Genetic–epigenetic interactions in cis: a major focus in the post-GWAS era , 2017, Genome Biology.

[27]  Antonio Federico,et al.  Transcriptome Profiling in Human Diseases: New Advances and Perspectives , 2017, International journal of molecular sciences.

[28]  F. Papavasiliou,et al.  A New Chapter in Genetic Medicine: RNA Editing and its Role in Disease Pathogenesis. , 2018, Trends in molecular medicine.

[29]  S. Fuqua,et al.  Novel Insights into Breast Cancer Genetic Variance through RNA Sequencing , 2013, Scientific Reports.

[30]  M. Peters,et al.  Systematic identification of trans eQTLs as putative drivers of known disease associations , 2013, Nature Genetics.

[31]  T. Lappalainen,et al.  SnapShot: Discovering Genetic Regulatory Variants by QTL Analysis , 2017, Cell.

[32]  Xiaoquan Wen,et al.  QuASAR: Quantitative Allele Specific Analysis of Reads , 2014, bioRxiv.

[33]  M. Heinig Using Gene Expression to Annotate Cardiovascular GWAS Loci , 2018, Front. Cardiovasc. Med..

[34]  Lennart Martens,et al.  LNCipedia 5: towards a reference set of human long non-coding RNAs , 2018, Nucleic Acids Res..

[35]  Kristin M. Abbott,et al.  Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels , 2014 .

[36]  Steven L Salzberg,et al.  HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[37]  Eun Ji Kim,et al.  Simulation-based comprehensive benchmarking of RNA-seq aligners , 2016, Nature Methods.

[38]  E. Petretto,et al.  Systems Genetics as a Tool to Identify Master Genetic Regulators in Complex Disease. , 2017, Methods in molecular biology.

[39]  Christopher D. Brown,et al.  Genetic-Variation-Driven Gene-Expression Changes Highlight Genes with Important Functions for Kidney Disease. , 2017, American journal of human genetics.

[40]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[41]  Hélène Zuber,et al.  RNA uridylation: a key posttranscriptional modification shaping the coding and noncoding transcriptome , 2018, Wiley interdisciplinary reviews. RNA.

[42]  R. Durbin,et al.  Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses , 2012, Nature Protocols.

[43]  A. Chess Monoallelic Gene Expression in Mammals. , 2016, Annual review of genetics.

[44]  Enrico Petretto,et al.  Expression QTLs Mapping and Analysis: A Bayesian Perspective. , 2017, Methods in molecular biology.

[45]  John C. Marioni,et al.  Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data , 2009, Bioinform..