Comprehensively evaluating cis-regulatory variation in the human prostate transcriptome by using gene-level allele-specific expression.

The identification of cis-acting regulatory variation in primary tissues has the potential to elucidate the genetic basis of complex traits and further our understanding of transcriptomic diversity across cell types. Expression quantitative trait locus (eQTL) association analysis using RNA sequencing (RNA-seq) data can improve upon the detection of cis-acting regulatory variation by leveraging allele-specific expression (ASE) patterns in association analysis. Here, we present a comprehensive evaluation of cis-acting eQTLs by analyzing RNA-seq gene-expression data and genome-wide high-density genotypes from 471 samples of normal primary prostate tissue. Using statistical models that integrate ASE information, we identified extensive cis-eQTLs across the prostate transcriptome and found that approximately 70% of expressed genes corresponded to a significant eQTL at a gene-level false-discovery rate of 0.05. Overall, cis-eQTLs were heavily concentrated near the transcription start and stop sites of affected genes, and effects were negatively correlated with distance. We identified multiple instances of cis-acting co-regulation by using phased genotype data and discovered 233 SNPs as the most strongly associated eQTLs for more than one gene. We also noted significant enrichment (25/50, p = 2E-5) of previously reported prostate cancer risk SNPs in prostate eQTLs. Our results illustrate the benefit of assessing ASE data in cis-eQTL analyses by showing better reproducibility of prior eQTL findings than of eQTL mapping based on total expression alone. Altogether, our analysis provides extensive functional context of thousands of SNPs in prostate tissue, and these results will be of critical value in guiding studies examining disease of the human prostate.

[1]  Peter A. Jones,et al.  Reconfiguration of nucleosome-depleted regions at distal regulatory elements accompanies DNA methylation of enhancers and insulators in cancer , 2014, Genome research.

[2]  Jiang Qian,et al.  TiGER: A database for tissue-specific gene expression and regulation , 2008, BMC Bioinformatics.

[3]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[4]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[5]  Wei Sun,et al.  eQTL Mapping Using RNA-seq Data , 2012, Statistics in Biosciences.

[6]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[7]  John C. Marioni,et al.  Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data , 2009, Bioinform..

[8]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[9]  Steven J. M. Jones,et al.  BMC Genomics BioMed Central Methodology article , 2006 .

[10]  O. Delaneau,et al.  Supplementary Information for ‘ Improved whole chromosome phasing for disease and population genetic studies ’ , 2012 .

[11]  Andrew R. Gehrke,et al.  Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo , 2010, The EMBO journal.

[12]  Susmita Datta,et al.  Surrogate variable analysis using partial least squares (SVA-PLS) in gene expression studies , 2012, Bioinform..

[13]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[14]  Korbinian Strimmer,et al.  fdrtool: a versatile R package for estimating local and tail area-based false discovery rates , 2008, Bioinform..

[15]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[16]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[17]  Minoru Kanehisa,et al.  The KEGG database. , 2002, Novartis Foundation symposium.

[18]  Peter Kraft,et al.  Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array , 2013, Nature Genetics.

[19]  E. Diamandis,et al.  Human Tissue Kallikrein 5 Is a Member of a Proteolytic Cascade Pathway Involved in Seminal Clot Liquefaction and Potentially in Prostate Cancer Progression* , 2006, Journal of Biological Chemistry.

[20]  John Trowsdale,et al.  The MHC, disease and selection. , 2011, Immunology letters.

[21]  R. Fisher Statistical methods for research workers , 1927, Protoplasma.

[22]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Simon G. Coetzee,et al.  Comprehensive Functional Annotation of 77 Prostate Cancer Risk Loci , 2014, PLoS genetics.

[24]  D. Koller,et al.  Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals , 2013, Genome research.

[25]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[26]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[27]  Maria Gutierrez-Arcelus,et al.  Allelic mapping bias in RNA-sequencing is not a major confounder in eQTL studies , 2014, Genome Biology.

[28]  P. Stenson,et al.  Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[29]  P. Deloukas,et al.  Common Regulatory Variation Impacts Gene Expression in a Cell Type–Dependent Manner , 2009, Science.

[30]  Nagarjun Vijay,et al.  Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA‐seq experiments , 2013, Molecular ecology.

[31]  Christopher D. Brown,et al.  Integrative Modeling of eQTLs and Cis-Regulatory Elements Suggests Mechanisms Underlying Cell Type Specificity of eQTLs , 2012, PLoS genetics.

[32]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[33]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[34]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[35]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[36]  Peilin Jia,et al.  Top associated SNPs in prostate cancer are significantly enriched in cis-expression quantitative trait loci and at transcription factor binding sites , 2014, Oncotarget.

[37]  P. Abel,et al.  Decreased HLA‐A expression in prostate cancer is associated with normal allele dosage in the majority of cases , 2000 .

[38]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[39]  P. Sun,et al.  MicroRNA-21 directly targets MARCKS and promotes apoptosis resistance and invasion in prostate cancer cells. , 2009, Biochemical and biophysical research communications.

[40]  Jing Wang,et al.  WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013 , 2013, Nucleic Acids Res..

[41]  Wei Li,et al.  RSeQC: quality control of RNA-seq experiments , 2012, Bioinform..

[42]  Wei Sun,et al.  A Statistical Framework for eQTL Mapping Using RNA‐seq Data , 2012, Biometrics.

[43]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[44]  P. Deloukas,et al.  Patterns of Cis Regulatory Variation in Diverse Human Populations , 2012, PLoS genetics.

[45]  H. Samaratunga,et al.  A novel transcript from the KLKP1 gene is androgen regulated, down‐regulated during prostate cancer progression and encodes the first non‐serine protease identified from the human kallikrein gene locus , 2008, The Prostate.

[46]  Andrew B West,et al.  RNA-Seq optimization with eQTL gold standards , 2013, BMC Genomics.

[47]  Joseph E Powell,et al.  Overlap of expression Quantitative Trait Loci (eQTL) in human brain and blood , 2014, BMC Medical Genomics.

[48]  M. Rubin,et al.  Variants at IRX4 as prostate cancer expression quantitative trait loci , 2013, European Journal of Human Genetics.

[49]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[50]  Robert E. Brown,et al.  Field effect in cancer-an update. , 2009, Annals of clinical and laboratory science.

[51]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[52]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[53]  P. Hirvikoski,et al.  Identification of androgen-regulated genes in human prostate , 2012, Molecular medicine reports.

[54]  David J. Arenillas,et al.  JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles , 2013, Nucleic Acids Res..

[55]  E. Diamandis,et al.  Major Role of Human KLK14 in Seminal Clot Liquefaction* , 2008, Journal of Biological Chemistry.

[56]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[57]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[58]  G. Hall,et al.  Uroplakin gene expression in normal human tissues and locally advanced bladder cancer , 2003, The Journal of pathology.

[59]  Saurabh Baheti,et al.  MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing , 2014, BMC Bioinformatics.

[60]  Y. Kamatani,et al.  Genome-wide association study in breast cancer survivors reveals SNPs associated with gene expression of genes belonging to MHC class I and II. , 2013, Genomics.

[61]  Bing Zhang,et al.  WebGestalt: an integrated system for exploring gene sets in various biological contexts , 2005, Nucleic Acids Res..

[62]  P. Stenson,et al.  Human Gene Mutation Database (HGMD , 2003 .