Bayesian methods to overcome the winner’s curse in genetic studies

Parameter estimates for associated genetic variants, report ed in the initial discovery samples, are often grossly inflated compared to the values observed in the follow-up replication samples. This type of bias is a consequence of the sequential procedure in which the estimated effect of an associated genetic marker must first pass a stringent significance threshold. We propose a hierarchical Bayes method in which a spike-and-slab prior is used to account for the possibility that the significant test result may be due to chance. We examine the robustness of the method using different priors corresponding to different degrees of confidence in the testing results and propose a Bayesian model averaging procedure to combine estimates produced by different models. The Bayesian estimators yield smaller variance compared to the conditional likelihood estimator and outperform the latter in studies with low power. We investigate the performance of the method with simulations and applications to four real data examples.

[1]  Ying Wang,et al.  Genomewide association study of leprosy. , 2009, The New England journal of medicine.

[2]  Angelo J. Canty,et al.  A Genome-Wide Association Study Identifies a Novel Major Locus for Glycemic Control in Type 1 Diabetes, as Measured by Both A1C and Glucose , 2009, Diabetes.

[3]  Michael Boehnke,et al.  Quantifying and correcting for the winner's curse in genetic association studies , 2009, Genetic epidemiology.

[4]  John P. A. Ioannidis,et al.  Validating, augmenting and refining genome-wide association signals , 2009, Nature Reviews Genetics.

[5]  Jack Bowden,et al.  Unbiased estimation of odds ratios: combining genomewide association scans with replication studies , 2009, Genetic epidemiology.

[6]  Pui-Yan Kwok,et al.  Genomewide Scan Reveals Association of Psoriasis with IL-23 and NF-κB Pathways , 2008, Nature Genetics.

[7]  R. Prentice,et al.  Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. , 2008, Biostatistics.

[8]  John Whitehead,et al.  Estimation following selection of the largest of two normal means. Journal of Statistical Planning and Inference 138, 1629-1638. , 2008 .

[9]  Fei Zou,et al.  Estimating odds ratios in genome scans: an approximate conditional likelihood approach. , 2008, American journal of human genetics.

[10]  Qizhai Li,et al.  Flexible design for following up positive findings. , 2007, American journal of human genetics.

[11]  R. A. Bailey,et al.  Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes , 2007, Nature Genetics.

[12]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[13]  Chad Garner,et al.  Upward bias in odds ratio estimates from genome‐wide association studies , 2007, Genetic epidemiology.

[14]  J. Pritchard,et al.  Overcoming the winner's curse: estimating penetrance parameters from case-control data. , 2007, American journal of human genetics.

[15]  Neal O. Jeffries,et al.  Multiple comparisons distortions of parameter estimates. , 2007, Biostatistics.

[16]  Eden R Martin,et al.  No gene is an island: the flip-flop phenomenon. , 2007, American journal of human genetics.

[17]  Shelley B. Bull,et al.  Locus-Specific Heritability Estimation via the Bootstrap in Linkage Scans for Quantitative Trait Loci , 2006, Human Heredity.

[18]  Nilanjan Chatterjee,et al.  Common genetic variants in proinflammatory and other immunoregulatory genes and risk for non-Hodgkin lymphoma. , 2006, Cancer research.

[19]  Lei Sun,et al.  Reduction of selection bias in genomewide studies by resampling , 2005, Genetic epidemiology.

[20]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[21]  Shizhong Xu,et al.  Theoretical basis of the Beavis effect. , 2003, Genetics.

[22]  J Blangero,et al.  Large upward bias in estimation of locus-specific effects from genomewide scans. , 2001, American journal of human genetics.

[23]  S. Richardson,et al.  Variable selection and Bayesian model averaging in case‐control studies , 2001, Statistics in medicine.

[24]  D. Schaid,et al.  Case-Control Studies of Genetic Markers: Power and Sample Size Approximations for Armitage’s Test for Trend , 2001, Human Heredity.

[25]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[26]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[27]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[28]  M. Clyde,et al.  Prediction via Orthogonalized Model Mixing , 1996 .

[29]  H. Chipman Bayesian variable selection with related predictors , 1995, bayes-an/9510001.

[30]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[31]  J. Q. Smith,et al.  1. Bayesian Statistics 4 , 1993 .

[32]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[33]  C. N. Morris,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[34]  R. Daniel Meyer,et al.  An Analysis for Unreplicated Fractional Factorials , 1986 .

[35]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[36]  D. Lana A genome-wide association study identifies a novel major locus for glycemic control in type 1 diabetes, as measured by both HbA1c and glucose. , 2009 .

[37]  Theodore R Holford,et al.  Genetic variation in TNF and IL10 and risk of non-Hodgkin lymphoma: a report from the InterLymph Consortium. , 2006, The Lancet. Oncology.

[38]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .

[39]  J. Geweke,et al.  Variable selection and model comparison in regression , 1994 .

[40]  Ove Frank,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .