Optimal multistage designs--a general framework for efficient genome-wide association studies.

Genome-wide association studies (GWAS) have become increasingly affordable but they are still costly. Therefore, cost saving 2-stage designs were proposed in the literature. The restriction to 2 stages, however, seems artificial and does not exploit the full potential of the underlying methods. We extend the 2-stage approach to the general framework of any number of stages. Based on the theory of group sequential methods, we derive optimal multistage designs. With current genotyping cost structures, our results suggest that up to 4 stages are sufficient in order to get feasible and efficient designs. Furthermore, we consider the problem of choosing the optimal number of stages depending on the costs of the statistical interim analysis at each stage and provide guidelines for planning the number of stages in practice. In particular, we found that in the majority of cases both 3-stage designs and 4-stage designs are more efficient than 2-stage designs. Although prices for marker panels are showing a continuing downward trend, we still recommend implementing and using optimal multistage designs in practice. In addition to the immediate benefit, it will be necessary to acquire know-how regarding the application of multistage designs in order to be able to adapt the general framework of multistage designs to upcoming technologies in the area of GWAS.

[1]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[2]  Jiannis Ragoussis,et al.  BeadArray-based genotyping. , 2008, Methods in molecular biology.

[3]  Yijun Zuo,et al.  Two-Stage Designs in Case–Control Association Analysis , 2006, Genetics.

[4]  G. Abecasis,et al.  Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies , 2006, Nature Genetics.

[5]  Daniel O. Stram,et al.  Optimal two-stage genome-wide association designs based on false discovery rate , 2006, Comput. Stat. Data Anal..

[6]  József Bukszár,et al.  Optimization of two-stage genetic designs where data are combined using an accurate and efficient approximation for Pearson's statistic. , 2006, Biometrics.

[7]  Peter Kraft,et al.  Efficient Two-Stage Genome-Wide Association Designs Based on False Positive Report Probabilities , 2005, Pacific Symposium on Biocomputing.

[8]  K. Gunderson,et al.  Whole genome genotyping technologies on the BeadArray™ platform , 2007 .

[9]  D. Thomas,et al.  Two‐Stage sampling designs for gene association studies , 2004, Genetic epidemiology.

[10]  Robert C. Elston,et al.  Adaptive Two-Stage Analysis of Genetic Association in Case-Control Designs , 2007, Human Heredity.

[11]  Sung-joon Min,et al.  Group sequential methods for nonlinear models in clinical trials with applications to prenatal research on twin births. , 2005 .

[12]  C. Begg,et al.  Two‐Stage Designs for Gene–Disease Association Studies , 2002, Biometrics.

[13]  Gang Zheng,et al.  On estimation of the variance in Cochran–Armitage trend tests for genetic association using case–control studies , 2006, Statistics in medicine.

[14]  R. Elston,et al.  Optimal two‐stage genotyping in population‐based association studies , 2003, Genetic epidemiology.

[15]  I. Pe’er,et al.  Optimal two‐stage genotyping designs for genome‐wide association scans , 2006, Genetic epidemiology.

[16]  P. Sasieni From genotypes to genes: doubling the sample size. , 1997, Biometrics.

[17]  H. Schäfer,et al.  Including sampling and phenotyping costs into the optimization of two stage designs for genome wide association studies , 2007, Genetic epidemiology.