A Robust Test for Two‐Stage Design in Genome‐Wide Association Studies

Summary A two‐stage design is cost‐effective for genome‐wide association studies (GWAS) testing hundreds of thousands of single nucleotide polymorphisms (SNPs). In this design, each SNP is genotyped in stage 1 using a fraction of case–control samples. Top‐ranked SNPs are selected and genotyped in stage 2 using additional samples. A joint analysis, combining statistics from both stages, is applied in the second stage. Follow‐up studies can be regarded as a two‐stage design. Once some potential SNPs are identified, independent samples are further genotyped and analyzed separately or jointly with previous data to confirm the findings. When the underlying genetic model is known, an asymptotically optimal trend test (TT) can be used at each analysis. In practice, however, genetic models for SNPs with true associations are usually unknown. In this case, the existing methods for analysis of the two‐stage design and follow‐up studies are not robust across different genetic models. We propose a simple robust procedure with genetic model selection to the two‐stage GWAS. Our results show that, if the optimal TT has about 80% power when the genetic model is known, then the existing methods for analysis of the two‐stage design have minimum powers about 20% across the four common genetic models (when the true model is unknown), while our robust procedure has minimum powers about 70% across the same genetic models. The results can be also applied to follow‐up and replication studies with a joint analysis.

[1]  Joseph L. Gastwirth,et al.  The Use of Maximin Efficiency Robust Tests in Combining Contingency Tables and Survival Analysis , 1985 .

[2]  P. Sasieni From genotypes to genes: doubling the sample size. , 1997, Biometrics.

[3]  A. Whittemore,et al.  Simple, robust linkage tests for affected sibs. , 1998, American journal of human genetics.

[4]  M J Podgor,et al.  Efficiency Robust Tests for Survival or Ordered Categorical Data , 1999, Biometrics.

[5]  N. Risch Searching for genetic determinants in the new millennium , 2000, Nature.

[6]  Joseph L. Gastwirth,et al.  Trend Tests for Case-Control Studies of Genetic Markers: Power, Sample Size and Robustness , 2002, Human Heredity.

[7]  C. Begg,et al.  Two‐Stage Designs for Gene–Disease Association Studies , 2002, Biometrics.

[8]  Joseph L. Gastwirth,et al.  Choice of scores in trend tests for case-control studies of candidate-gene associations , 2003 .

[9]  R. Elston,et al.  Optimal two‐stage genotyping in population‐based association studies , 2003, Genetic epidemiology.

[10]  C. Begg,et al.  Two‐Stage Designs for Gene–Disease Association Studies with Sample Size Constraints , 2004, Biometrics.

[11]  D. Thomas,et al.  Two‐Stage sampling designs for gene association studies , 2004, Genetic epidemiology.

[12]  D. Duggan,et al.  Recent developments in genomewide association scans: a workshop summary and review. , 2005, American journal of human genetics.

[13]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[14]  Jacqueline K. Wittke-Thompson,et al.  Rational inferences about departures from Hardy-Weinberg equilibrium. , 2005, American journal of human genetics.

[15]  Kai Wang,et al.  A constrained-likelihood approach to marker-trait association studies. , 2005, American journal of human genetics.

[16]  R. Elston,et al.  A powerful method of combining measures of association and Hardy–Weinberg disequilibrium for fine‐mapping in case‐control studies , 2006, Statistics in medicine.

[17]  D. Lin,et al.  Evaluating statistical significance in two-stage genomewide association studies. , 2006, American journal of human genetics.

[18]  Gang Zheng,et al.  On estimation of the variance in Cochran–Armitage trend tests for genetic association using case–control studies , 2006, Statistics in medicine.

[19]  Yijun Zuo,et al.  Two-Stage Designs in Case–Control Association Analysis , 2006, Genetics.

[20]  I. Pe’er,et al.  Optimal two‐stage genotyping designs for genome‐wide association scans , 2006, Genetic epidemiology.

[21]  G. Abecasis,et al.  Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies , 2006, Nature Genetics.

[22]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[23]  Qizhai Li,et al.  Flexible design for following up positive findings. , 2007, American journal of human genetics.

[24]  M. Boehnke,et al.  So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. , 2007, American journal of human genetics.

[25]  R. Elston,et al.  Improving power in contrasting linkage-disequilibrium patterns between cases and controls. , 2007, American journal of human genetics.

[26]  Wentian Li,et al.  Genotype-Based Case-Control Analysis, Violation of Hardy-Weinberg Equilibrium, and Phase Diagrams , 2007, APBC.

[27]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[28]  R. Elston,et al.  Multistage sampling for genetic studies. , 2007, Annual review of genomics and human genetics.

[29]  John P.A. Ioannidis,et al.  Non-Replication and Inconsistency in the Genome-Wide Association Setting , 2007, Human Heredity.

[30]  Gang Zheng,et al.  Genetic model selection in two-phase analysis for case-control association studies. , 2008, Biostatistics.

[31]  Qizhai Li,et al.  Efficient Approximation of P‐value of the Maximum of Correlated Tests, with Applications to Genome‐Wide Association Studies , 2008, Annals of human genetics.