Replicability analysis for genome-wide association studies

The paramount importance of replicating associations is well recognized in the genome-wide associaton (GWA) research community, yet methods for assessing replicability of associations are scarce. Published GWA studies often combine separately the results of primary studies and of the follow-up studies. Informally, reporting the two separate meta-analyses, that of the primary studies and follow-up studies, gives a sense of the replicability of the results. We suggest a formal empirical Bayes approach for discovering whether results have been replicated across studies, in which we estimate the optimal rejection region for discovering replicated results. We demonstrate, using realistic simulations, that the average false discovery proportion of our method remains small. We apply our method to six type two diabetes (T2D) GWA studies. Out of 803 SNPs discovered to be associated with T2D using a typical meta-analysis, we discovered 219 SNPs with replicated associations with T2D. We recommend complementing a meta-analysis with a replicability analysis for GWA studies.

[1]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[2]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[3]  M. Marazita,et al.  Genome-wide Association Studies , 2012, Journal of dental research.

[4]  G. Abecasis,et al.  Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies , 2006, Nature Genetics.

[5]  D. Cox,et al.  A note on pseudolikelihood constructed from marginal densities , 2004 .

[6]  P. Donnelly,et al.  Replicating genotype–phenotype associations , 2007, Nature.

[7]  P. Deb Finite Mixture Models , 2008 .

[8]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Y. Benjamini,et al.  Screening for Partial Conjunction Hypotheses , 2008, Biometrics.

[10]  Loki Natarajan,et al.  Statistical tests for the intersection of independent lists of genes: Sensitivity, FDR, and type I error control , 2012, 1206.6636.

[11]  Omkar Muralidharan,et al.  An empirical Bayes mixture method for effect size and false discovery rate estimation , 2010, 1010.1425.

[12]  John D. Storey A direct approach to false discovery rates , 2002 .

[13]  Bradley Efron,et al.  Large-scale inference , 2010 .

[14]  Loki Natarajan,et al.  Exact statistical tests for the intersection of independent lists of genes. , 2012, The annals of applied statistics.

[15]  John P A Ioannidis,et al.  Improving Validation Practices in “Omics” Research , 2011, Science.

[16]  Xiao-Hua Zhou,et al.  Statistical Methods for Meta‐Analysis , 2008 .

[17]  Ayellet V. Segrè,et al.  Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis , 2010, Nature Genetics.

[18]  Marina Bogomolov,et al.  Discovering Findings That Replicate From a Primary Study of High Dimension to a Follow-Up Study , 2012, 1207.0187.

[19]  Wenguang Sun,et al.  Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[20]  Yoav Benjamini,et al.  Selective inference in complex research , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[21]  Peter Kraft,et al.  Replication in genome-wide association studies. , 2009, Statistical science : a review journal of the Institute of Mathematical Statistics.

[22]  John D. Storey The optimal discovery procedure: a new approach to simultaneous significance testing , 2007 .

[23]  T. Cai,et al.  Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons , 2006, math/0611108.

[24]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[25]  Art B. Owen,et al.  Karl Pearson’s meta analysis revisited , 2009, 0911.3531.

[26]  Jon Wakefield,et al.  A Bayesian measure of the probability of false discovery in genetic epidemiology studies. , 2007, American journal of human genetics.

[27]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[28]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[29]  Y. Benjamini,et al.  Adaptive linear step-up procedures that control the false discovery rate , 2006 .

[30]  Y. Benjamini,et al.  Quantitative Trait Loci Analysis Using the False Discovery Rate , 2005, Genetics.

[31]  Wenguang Sun,et al.  Multiple Testing for Pattern Identification, With Applications to Microarray Time-Course Experiments , 2011 .

[32]  Korbinian Strimmer,et al.  A unified approach to false discovery rate estimation , 2008, BMC Bioinformatics.

[33]  David J. Spiegelhalter,et al.  Microarrays, Empirical Bayes and the Two-Groups Model. Comment. , 2008 .

[34]  Peter Donnelly,et al.  HAPGEN2: simulation of multiple disease SNPs , 2011, Bioinform..

[35]  M. McCarthy,et al.  Replication of Genome-Wide Association Signals in UK Samples Reveals Risk Loci for Type 2 Diabetes , 2007, Science.