Fine mapping versus replication in whole-genome association studies.

Association replication studies have a poor track record and, even when successful, often claim association with different markers, alleles, and phenotypes than those reported in the primary study. It is unknown whether these outcomes reflect genuine associations or false-positive results. A greater understanding of these observations is essential for genomewide association (GWA) studies, since they have the potential to identify multiple new associations that that will require external validation. Theoretically, a repeat association with precisely the same variant in an independent sample is the gold standard for replication, but testing additional variants is commonplace in replication studies. Finding different associated SNPs within the same gene or region as that originally identified is often reported as confirmatory evidence. Here, we compare the probability of replicating a gene or region under two commonly used marker-selection strategies: an "exact" approach that involves only the originally significant markers and a "local" approach that involves both the originally significant markers and others in the same region. When a region of high intermarker linkage disequilibrium is tested to replicate an initial finding that is only weak association with disease, the local approach is a good strategy. Otherwise, the most powerful and efficient strategy for replication involves testing only the initially identified variants. Association with a marker other than that originally identified can occur frequently, even in the presence of real effects in a low-powered replication study, and instances of such association increase as the number of included variants increases. Our results provide a basis for the design and interpretation of GWA replication studies and point to the importance of a clear distinction between fine mapping and replication after GWA.

[1]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[2]  David S. Moore,et al.  The Effect of Positive Dependence on Chi‐Squared Tests for Categorical Data , 1985 .

[3]  K. Roeder,et al.  The power of genomic control. , 2000, American journal of human genetics.

[4]  J. Ioannidis,et al.  Replication validity of genetic association studies , 2001, Nature Genetics.

[5]  P. McKeigue,et al.  Problems of reporting genetic associations with complex outcomes , 2003, The Lancet.

[6]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[7]  E. Lander,et al.  Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease , 2003, Nature Genetics.

[8]  P. Sham,et al.  The future of association studies: gene-based analysis and replication. , 2004, American journal of human genetics.

[9]  C. Carlson,et al.  Mapping complex disease loci in whole-genome association studies , 2004, Nature.

[10]  M J Owen,et al.  Identification in 2 independent samples of a novel schizophrenia risk haplotype of the dystrobrevin binding protein gene (DTNBP1). , 2004, Archives of general psychiatry.

[11]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[12]  Sarah Lewis,et al.  Genetic epidemiology and public health: hope, hype, and future prospects , 2005, The Lancet.

[13]  D. Duggan,et al.  Recent developments in genomewide association scans: a workshop summary and review. , 2005, American journal of human genetics.

[14]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[15]  D. Clayton,et al.  Population structure, differential bias and genomic control in a large-scale, case-control association study , 2005, Nature Genetics.

[16]  D. Clayton,et al.  Genome-wide association studies: theoretical and practical concerns , 2005, Nature Reviews Genetics.

[17]  Lyle J Palmer,et al.  Genetic Epidemiology 4 Shaking the tree : mapping complex disease genes with linkage disequilibrium , 2022 .

[18]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[19]  C. Ober Perspectives on the past decade of asthma genetics. , 2005, The Journal of allergy and clinical immunology.

[20]  Judy H. Cho,et al.  A Genome-Wide Association Study Identifies IL23R as an Inflammatory Bowel Disease Gene , 2006, Science.

[21]  P. Stern,et al.  Cd8 T‐cell recognition of human 5T4 oncofetal antigen , 2006, International journal of cancer.

[22]  G. Abecasis,et al.  Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies , 2006, Nature Genetics.

[23]  J. Terwilliger,et al.  An utter refutation of the ‘Fundamental Theorem of the HapMap’ , 2006, European Journal of Human Genetics.

[24]  F. Hu,et al.  A Common Genetic Variant Is Associated with Adult and Childhood Obesity , 2006, Science.

[25]  Thomas Lengauer,et al.  A genome-wide association scan of nonsynonymous SNPs identifies a susceptibility variant for Crohn disease in ATG16L1 , 2007, Nature Genetics.

[26]  A. Whittemore,et al.  Multiple regions within 8q24 independently affect risk for prostate cancer , 2007, Nature Genetics.

[27]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[28]  P. Fearnhead,et al.  Genome-wide association study of prostate cancer identifies a second risk locus at 8q24 , 2007, Nature Genetics.

[29]  W. Willett,et al.  A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer , 2007, Nature Genetics.

[30]  Sander Greenland,et al.  Bayesian perspectives for epidemiological research. II. Regression analysis. , 2007, International journal of epidemiology.

[31]  Kari Stefansson,et al.  A common variant on chromosome 9p21 affects the risk of myocardial infarction. , 2007, Science.

[32]  Lester L. Peters,et al.  Genome-wide association study identifies novel breast cancer susceptibility loci , 2007, Nature.

[33]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[34]  P. Donnelly,et al.  Replicating genotype–phenotype associations , 2007, Nature.

[35]  D. Gudbjartsson,et al.  Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24 , 2007, Nature Genetics.

[36]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .