Power, false discovery rate and Winner’s Curse in eQTL studies

Abstract Investigation of the genetic architecture of gene expression traits has aided interpretation of disease and trait-associated genetic variants; however, key aspects of expression quantitative trait loci (eQTL) study design and analysis remain understudied. We used extensive, empirically driven simulations to explore eQTL study design and the performance of various analysis strategies. Across multiple testing correction methods, false discoveries of genes with eQTLs (eGenes) were substantially inflated when false discovery rate (FDR) control was applied to all tests and only appropriately controlled using hierarchical procedures. All multiple testing correction procedures had low power and inflated FDR for eGenes whose causal SNPs had small allele frequencies using small sample sizes (e.g. frequency <10% in 100 samples), indicating that even moderately low frequency eQTL SNPs (eSNPs) in these studies are enriched for false discoveries. In scenarios with ≥80% power, the top eSNP was the true simulated eSNP 90% of the time, but substantially less frequently for very common eSNPs (minor allele frequencies >25%). Overestimation of eQTL effect sizes, so-called ‘Winner’s Curse’, was common in low and moderate power settings. To address this, we developed a bootstrap method (BootstrapQTL) that led to more accurate effect size estimation. These insights provide a foundation for future eQTL studies, especially those with sampling constraints and subtly different conditions.

[1]  Nicola J. Rinaldi,et al.  Genetic effects on gene expression across human tissues , 2017, Nature.

[2]  David A. Knowles,et al.  An Efficient Multiple-Testing Adjustment for eQTL Studies that Accounts for Linkage Disequilibrium between Variants. , 2016, American journal of human genetics.

[3]  R. Ophoff,et al.  Unraveling the Regulatory Mechanisms Underlying Tissue-Dependent Genetic Variation of Gene Expression , 2012, PLoS genetics.

[4]  Shelley B. Bull,et al.  BR-squared: a practical solution to the winner’s curse in genome-wide scans , 2011, Human Genetics.

[5]  Lei Sun,et al.  Reduction of selection bias in genomewide studies by resampling , 2005, Genetic epidemiology.

[6]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[7]  Tom Michoel,et al.  Cardiometabolic risk loci share downstream cis- and trans-gene regulation across tissues and diseases , 2016, Science.

[8]  A. Singleton,et al.  Genetic variability in the regulation of gene expression in ten regions of the human brain , 2014, Nature Neuroscience.

[9]  Seyoung Kim,et al.  Correction: Learning Gene Networks under SNP Perturbations Using eQTL Datasets , 2014, PLoS Comput. Biol..

[10]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Shane J. Neph,et al.  Systematic Localization of Common Disease-Associated Variation in Regulatory DNA , 2012, Science.

[12]  S. Hunt,et al.  Genome-Wide Associations of Gene Expression Variation in Humans , 2005, PLoS genetics.

[13]  Christine B. Peterson,et al.  Many Phenotypes Without Many False Discoveries: Error Controlling Strategies for Multitrait Association Studies , 2015, Genetic epidemiology.

[14]  Markus Perola,et al.  Metabonomic, transcriptomic, and genomic variation of a population cohort , 2010, Molecular systems biology.

[15]  Olivier Delaneau,et al.  A complete tool set for molecular QTL discovery and analysis , 2016, Nature Communications.

[16]  Pjanic Milos,et al.  A gene-based association method for mapping traits using reference transcriptome data through genetically regulated expression (GReX) component, PrediXcan , 2017 .

[17]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[18]  H. Stefánsson,et al.  Genetics of gene expression and its effect on disease , 2008, Nature.

[19]  Leopold Parts,et al.  A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies , 2010, PLoS Comput. Biol..

[20]  J. Pritchard,et al.  Overcoming the winner's curse: estimating penetrance parameters from case-control data. , 2007, American journal of human genetics.

[21]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[22]  A. Hofman,et al.  Identification of context-dependent expression quantitative trait loci in whole blood , 2016, Nature Genetics.

[23]  G. Abecasis,et al.  Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies , 2006, Nature Genetics.

[24]  Eleazar Eskin,et al.  Accurate and fast multiple-testing correction in eQTL studies. , 2015, American journal of human genetics.

[25]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[26]  K. Hao,et al.  Bayesian method to predict individual SNP genotypes from gene expression data , 2012, Nature Genetics.

[27]  M. Stephens,et al.  A Statistical Framework for Joint eQTL Analysis in Multiple Tissues , 2012, PLoS genetics.

[28]  Neil D. Lawrence,et al.  Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies , 2012, PLoS Comput. Biol..

[29]  John P. A. Ioannidis,et al.  Validating, augmenting and refining genome-wide association signals , 2009, Nature Reviews Genetics.

[30]  Kristin Reiche,et al.  Dissecting the genetics of the human transcriptome identifies novel trait-related trans-eQTLs and corroborates the regulatory relevance of non-protein coding loci† , 2015, Human molecular genetics.

[31]  Markus Perola,et al.  An Immune Response Network Associated with Blood Lipid Levels , 2010, PLoS genetics.

[32]  Wei Sun,et al.  A Statistical Framework for eQTL Mapping Using RNA-seq Data , 2012 .

[33]  Donald W. Bowden,et al.  Mapping adipose and muscle tissue expression quantitative trait loci in African Americans to identify genes for type 2 diabetes and obesity , 2016, Human Genetics.

[34]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[35]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[36]  Peter Donnelly,et al.  HAPGEN2: simulation of multiple disease SNPs , 2011, Bioinform..

[37]  Christine B. Peterson,et al.  TreeQTL: hierarchical error control for eQTL findings , 2015, bioRxiv.

[38]  Vladimir I. Vladimirov,et al.  A meta-analysis of gene expression quantitative trait loci in brain , 2014, Translational Psychiatry.

[39]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[40]  H. Schielzeth,et al.  Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse , 2010, Behavioral Ecology and Sociobiology.

[41]  L. Kruglyak,et al.  The role of regulatory variation in complex traits and disease , 2015, Nature Reviews Genetics.

[42]  Emmanouil T. Dermitzakis,et al.  Fast and efficient QTL mapper for thousands of molecular phenotypes , 2015, bioRxiv.

[43]  N. Cox,et al.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS , 2010, PLoS genetics.

[44]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[45]  Yurii S. Aulchenko,et al.  BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btm108 Genetics and population analysis GenABEL: an R library for genome-wide association analysis , 2022 .

[46]  David E Hill,et al.  Dynamic Role of trans Regulation of Gene Expression in Relation to Complex Traits. , 2017, American journal of human genetics.

[47]  Itsik Pe'er,et al.  Statistical correction of the Winner’s Curse explains replication variability in quantitative trait genome-wide association studies , 2017, bioRxiv.

[48]  Kaanan P. Shah,et al.  A gene-based association method for mapping traits using reference transcriptome data , 2015, Nature Genetics.

[49]  Fred A. Wright,et al.  Conditional eQTL analysis reveals allelic heterogeneity of gene expression , 2017, Human molecular genetics.

[50]  Andrew J. Lees,et al.  Identification of common variants influencing risk of the tauopathy Progressive Supranuclear Palsy , 2011, Nature Genetics.

[51]  Chad Garner,et al.  Upward bias in odds ratio estimates from genome‐wide association studies , 2007, Genetic epidemiology.

[52]  J. Knight,et al.  Genomic modulators of gene expression in human neutrophils , 2015, Nature Communications.