Including sampling and phenotyping costs into the optimization of two stage designs for genome wide association studies

We propose optimized two‐stage designs for genome‐wide case‐control association studies, using a hypothesis testing paradigm. To save genotyping costs, the complete marker set is genotyped in a sub‐sample only (stage I). On stage II, the most promising markers are then genotyped in the remaining sub‐sample. In recent publications, two‐stage designs were proposed which minimize the overall genotyping costs. To achieve full design optimization, we additionally include sampling costs into both the cost function and the design optimization. The resulting optimal designs differ markedly from those optimized for genotyping costs only (partially optimized designs), and achieve considerable further cost reductions. Compared with partially optimized designs, fully optimized two‐stage designs have higher first‐stage sample proportion. Furthermore, the increment of the sample size over the one‐stage design, which is necessary in two‐stage designs in order to compensate for the loss of power due to partial genotyping, is less pronounced for fully optimized two‐stage designs. In addition, we address the scenario where the investigator is interested to gain as much information as possible, however is restricted in terms of a budget. In that we develop two‐stage designs that maximize the power under a certain cost constraint. Genet. Epidemiol. 2007. © 2007 Wiley‐Liss, Inc.