Accounting for selection and correlation in the analysis of two-stage genome-wide association studies

The problem of selection bias has long been recognized in the analysis of two-stage trials, where promising candidates are selected in stage 1 for confirmatory analysis in stage 2. To efficiently correct for bias, uniformly minimum variance conditionally unbiased estimators (UMVCUEs) have been proposed for a wide variety of trial settings, but where the population parameter estimates are assumed to be independent. We relax this assumption and derive the UMVCUE in the multivariate normal setting with an arbitrary known covariance structure. One area of application is the estimation of odds ratios (ORs) when combining a genome-wide scan with a replication study. Our framework explicitly accounts for correlated single nucleotide polymorphisms, as might occur due to linkage disequilibrium. We illustrate our approach on the measurement of the association between 11 genetic variants and the risk of Crohn's disease, as reported in Parkes and others (2007. Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility. Nat. Gen. 39(7), 830–832.), and show that the estimated ORs can vary substantially if both selection and correlation are taken into account.

[1]  Allan R. Sampson,et al.  Extension of a Two-Stage Conditionally Unbiased Estimator of the Selected Population to the Bivariate Normal Case , 2007 .

[2]  Jack Bowden,et al.  Unbiased estimation of odds ratios: combining genomewide association scans with replication studies , 2009, Genetic epidemiology.

[3]  Sonja W. Scholz,et al.  A two-stage genome-wide association study of sporadic amyotrophic lateral sclerosis. , 2009, Human molecular genetics.

[4]  Jack Bowden,et al.  Unbiased Estimation of Selected Treatment Means in Two‐Stage Trials , 2008, Biometrical journal. Biometrische Zeitschrift.

[5]  Jack Bowden,et al.  Conditionally unbiased and near unbiased estimation of the selected treatment mean for multistage drop-the-losers trials , 2013, Biometrical journal. Biometrische Zeitschrift.

[6]  Pak Chung Sham,et al.  Two-stage genome-wide association study identifies variants in CAMSAP1L1 as susceptibility loci for epilepsy in Chinese. , 2012, Human molecular genetics.

[7]  R. Elston,et al.  Optimal two‐stage genotyping in population‐based association studies , 2003, Genetic epidemiology.

[8]  Harold B. Sackrowitz,et al.  Two stage conditionally unbiased estimators of the selected mean , 1989 .

[9]  Margaret Sullivan Pepe,et al.  Conditional estimation after a two-stage diagnostic biomarker study that allows early termination for futility. , 2012, Statistics in medicine.

[10]  Nigel Stallard,et al.  Conditionally unbiased estimation in phase II/III clinical trials with early stopping for futility , 2013, Statistics in medicine.

[11]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[12]  C. Begg,et al.  Two‐Stage Designs for Gene–Disease Association Studies , 2002, Biometrics.

[13]  Jack Bowden,et al.  Correcting for bias in the selection and validation of informative diagnostic tests , 2015, Statistics in medicine.

[14]  Martin Posch,et al.  Testing and estimation in flexible group sequential designs with adaptive treatment selection , 2005, Statistics in medicine.

[15]  Sambasivarao Damaraju,et al.  A two-stage association study identifies methyl-CpG-binding domain protein 2 gene polymorphisms as candidates for breast cancer susceptibility , 2012, European Journal of Human Genetics.

[16]  Frank Bretz,et al.  Twenty‐five years of confirmatory adaptive designs: opportunities and pitfalls , 2015, Statistics in medicine.

[17]  R. Prentice,et al.  Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. , 2008, Biostatistics.

[18]  W. Brannath,et al.  Selection and bias—Two hostile brothers , 2009, Statistics in medicine.

[19]  Margaret Sullivan Pepe,et al.  Conditional estimation of sensitivity and specificity from a phase 2 biomarker study allowing early termination for futility , 2009, Statistics in medicine.

[20]  Allan R Sampson,et al.  Drop‐the‐Losers Design: Normal Case , 2005, Biometrical journal. Biometrische Zeitschrift.

[21]  C. Begg,et al.  Two‐Stage Designs for Gene–Disease Association Studies with Sample Size Constraints , 2004, Biometrics.

[22]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[23]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[24]  Alastair Forbes,et al.  Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility , 2007, Nature Genetics.

[25]  John P. A. Ioannidis,et al.  Validating, augmenting and refining genome-wide association signals , 2009, Nature Reviews Genetics.

[26]  Juan Pablo Lewinger,et al.  Methodological Issues in Multistage Genome-wide Association Studies. , 2009, Statistical science : a review journal of the Institute of Mathematical Statistics.

[27]  D. Thomas,et al.  Two‐Stage sampling designs for gene association studies , 2004, Genetic epidemiology.

[28]  Frank Bretz,et al.  TUTORIAL IN BIOSTATISTICS Adaptive designs for confirmatory clinical trials , 2022 .