Is Replication the Gold Standard for Validating Genome-Wide Association Findings?

With the advent of genome-wide association (GWA) studies, researchers are hoping that reliable genetic association of common human complex diseases/traits can be detected. Currently, there is an increasing enthusiasm about GWA and a number of GWA studies have been published. In the field a common practice is that replication should be used as the gold standard to validate an association finding. In this article, based on empirical and theoretical data, we emphasize that replication of GWA findings can be quite difficult, and should not always be expected, even when true variants are identified. The probability of replication becomes smaller with the increasing number of independent GWA studies if the power of individual replication studies is less than 100% (which is usually the case), and even a finding that is replicated may not necessarily be true. We argue that the field may have unreasonably high expectations on success of replication. We also wish to raise the question whether it is sufficient or necessary to treat replication as the ultimate and gold standard for defining true variants. We finally discuss the usefulness of integrating evidence from multiple levels/sources such as genetic epidemiological studies (at the DNA level), gene expression studies (at the RNA level), proteomics (at the protein level), and follow-up molecular and cellular studies for eventual validation and illumination of the functional relevance of the genes uncovered.

[1]  Heikki Mannila,et al.  Evaluation of HapMap data in six populations of European descent , 2008, European Journal of Human Genetics.

[2]  H. Stefánsson,et al.  Genetics of gene expression and its effect on disease , 2008, Nature.

[3]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[4]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[5]  Jocelyn Kaiser,et al.  Closing the Net on Common Disease Genes , 2007, Science.

[6]  Nathaniel Rothman,et al.  Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. , 2004, Journal of the National Cancer Institute.

[7]  V. Moskvina,et al.  Detailed Analysis of the Relative Power of Direct and Indirect Association Studies and the Implications for Their Interpretation , 2007, Human Heredity.

[8]  Hadar I. Isaac,et al.  The linkage disequilibrium maps of three human chromosomes across four populations reflect their demographic history and a common underlying recombination pattern. , 2005, Genome research.

[9]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[10]  J. Pritchard,et al.  Overcoming the winner's curse: estimating penetrance parameters from case-control data. , 2007, American journal of human genetics.

[11]  Thomas Meitinger,et al.  Linkage disequilibrium patterns and tagSNP transferability among European populations. , 2005, American journal of human genetics.

[12]  G. Abecasis,et al.  A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants , 2007, Science.

[13]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[14]  Jon Wakefield,et al.  A Bayesian measure of the probability of false discovery in genetic epidemiology studies. , 2007, American journal of human genetics.

[15]  Jocelyn Kaiser,et al.  Genome-wide association. Closing the net on common disease genes. , 2007, Science.

[16]  J. Hirschhorn,et al.  A comprehensive review of genetic association studies , 2002, Genetics in Medicine.

[17]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[18]  Marcia M. Nizzari,et al.  Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels , 2007, Science.

[19]  M. Jarvelin,et al.  A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity , 2007, Science.

[20]  P. Donnelly,et al.  Replicating genotype–phenotype associations , 2007, Nature.

[21]  P. Sham,et al.  Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. , 2000, American journal of human genetics.

[22]  K. Taylor,et al.  Genome-Wide Association , 2007, Diabetes.

[23]  J. Terwilliger,et al.  An utter refutation of the ‘Fundamental Theorem of the HapMap’ , 2006, European Journal of Human Genetics.

[24]  Geoffrey B. Nilsen,et al.  Whole-Genome Patterns of Common DNA Variation in Three Human Populations , 2005, Science.

[25]  Hui-Ju Tsai,et al.  Comparison of three methods to estimate genetic ancestry and control for stratification in genetic association studies among admixed populations , 2005, Human Genetics.

[26]  H. Bickeböller,et al.  Case‐Control Association Tests Correcting for Population Stratification , 2006, Annals of human genetics.

[27]  Evangelos Evangelou,et al.  Heterogeneity in Meta-Analyses of Genome-Wide Association Investigations , 2007, PloS one.

[28]  John P A Ioannidis,et al.  Required sample size and nonreplicability thresholds for heterogeneous genetic associations , 2008, Proceedings of the National Academy of Sciences.

[29]  Beverley Balkau,et al.  Variation in FTO contributes to childhood obesity and severe adult obesity , 2007, Nature Genetics.

[30]  P. McKeigue,et al.  Problems of reporting genetic associations with complex outcomes , 2003, The Lancet.

[31]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[32]  Anthony J Brookes,et al.  Linkage disequilibrium patterns vary substantially among populations , 2005, European Journal of Human Genetics.

[33]  C. Dina New insights into the genetics of body weight , 2008, Current opinion in clinical nutrition and metabolic care.

[34]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[35]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[36]  Hongyu Zhao,et al.  Haplotype block structures show significant variation among populations , 2004, Genetic epidemiology.

[37]  Lon R. Cardon,et al.  The complex interplay among factors that influence allelic association , 2004, Nature Reviews Genetics.

[38]  James Strait,et al.  Genome-Wide Association Scan Shows Genetic Variants in the FTO Gene Are Associated with Obesity-Related Traits , 2007, PLoS genetics.

[39]  Michael P Epstein,et al.  A simple and improved correction for population stratification in case-control studies. , 2007, American journal of human genetics.

[40]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[41]  David M. Evans,et al.  A comparison of linkage disequilibrium patterns and estimated population recombination rates across multiple populations. , 2005, American journal of human genetics.

[42]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[43]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[44]  D. Lawlor,et al.  Clustered Environments and Randomized Genes: A Fundamental Distinction between Conventional and Genetic Epidemiology , 2007, PLoS medicine.

[45]  A. Morris,et al.  Fine mapping versus replication in whole-genome association studies. , 2007, American journal of human genetics.

[46]  L R Cardon,et al.  Extent and distribution of linkage disequilibrium in three genomic regions. , 2001, American journal of human genetics.

[47]  J Blangero,et al.  Large upward bias in estimation of locus-specific effects from genomewide scans. , 2001, American journal of human genetics.

[48]  Aldons J. Lusis,et al.  Metabolic syndrome: from epidemiology to systems biology , 2008, Nature Reviews Genetics.

[49]  David J Balding,et al.  Logistic regression protects against population structure in genetic association studies. , 2005, Genome research.

[50]  J. Pritchard,et al.  Use of unlinked genetic markers to detect population stratification in association studies. , 1999, American journal of human genetics.