Optimal two‐stage genotyping designs for genome‐wide association scans

The much‐anticipated fixed‐array, genome‐wide SNP genotyping technologies make large‐scale genome‐wide association scans now possible for large numbers of subjects. In this paper we reconsider the problem (Satagopan and Elston [2003] Genet Epidemiol 25:149–157) of optimizing a two‐stage genotyping design to deal with important new issues that are relevant when studies are expanded from candidate gene size to a genome‐wide scale. We investigate how the basic two‐stage genotyping approach, in which all markers are genotyped in an initial group of subjects (stage I) and only the promising markers are genotyped in additional subjects (stage II), can be used to reduce genotyping cost in a genome‐wide case‐control association study even after allowing for much higher per genotype costs using specially designed assays in stage II, compared to the fixed array of SNPs used in stage I. In addition, we consider the problem of using measured SNPs to make (imperfect) prediction of unmeasured SNPs for association tests of all SNPs (measured or unmeasured) genome wide and the utility of expanding genotyping densities in stage II in the regions where significant associations were detected in stage I. Under a set of reasonable but conservative assumptions, we derive optimal two‐stage design configurations (sample sizes and the thresholds of significance in both stages) with these optimal designs depending both on the total number of markers tested and upon the ratios of cost in stage II versus stage I. In addition we show how existing software for power and sample size calculations can be used for the purpose of designing two‐stage studies, for a wide range of assumptions about the number of markers genotyped and the costs of genotyping in each stage of the study. Genet. Epidemiol. 2006. © 2006 Wiley‐Liss, Inc.

[1]  Tor D. Tosteson,et al.  Designing a logistic regression study using surrogate measures for exposure and outcome , 1990 .

[2]  D. Ruppert,et al.  Measurement Error in Nonlinear Models , 1995 .

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[5]  Francis S. Collins,et al.  Variations on a Theme: Cataloging Human DNA Sequence Variation , 1997, Science.

[6]  N. Risch Searching for genetic determinants in the new millennium , 2000, Nature.

[7]  D. Nickerson,et al.  Variation is the spice of life , 2001, Nature Genetics.

[8]  W. Gauderman Sample size requirements for association studies of gene-gene interaction. , 2002, American journal of epidemiology.

[9]  †The International HapMap Consortium The International HapMap Project , 2003, Nature.

[10]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[11]  R. Elston,et al.  Optimal two‐stage genotyping in population‐based association studies , 2003, Genetic epidemiology.

[12]  Nathaniel Rothman,et al.  Assessing the Probability That a Positive Report is False: An Approach for Molecular Epidemiology Studies , 2004 .

[13]  Daniel O Stram,et al.  Tag SNP selection for association studies , 2004, Genetic epidemiology.

[14]  Nathaniel Rothman,et al.  Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. , 2004, Journal of the National Cancer Institute.

[15]  K. Ozaki,et al.  Genome-wide association study to identify SNPs conferring risk of myocardial infarction and their functional analyses , 2005, Cellular and Molecular Life Sciences CMLS.

[16]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[17]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[18]  D. Duggan,et al.  Recent developments in genomewide association scans: a workshop summary and review. , 2005, American journal of human genetics.

[19]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[20]  D. Clayton,et al.  Genome-wide association studies: theoretical and practical concerns , 2005, Nature Reviews Genetics.

[21]  A. Clark,et al.  Application of the stepwise focusing method to optimize the cost-effectiveness of genome-wide association studies with limited research budgets for genotyping and phenotyping. , 2005, Annals of human genetics.

[22]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.