Required sample size and nonreplicability thresholds for heterogeneous genetic associations

Many gene–disease associations proposed to date have not been consistently replicated across different populations. Nonreplication often reflects false positives in the original claims. However, occasionally, nonreplication may be due to heterogeneity due to biases or even genuine diversity of the genetic effects in different populations. Here, we propose methods for estimating the required sample size to replicate an association across many studies with different amounts of between-study heterogeneity, when data are summarized through metaanalysis. We demonstrate thresholds of between-study heterogeneity (τ02) above which one cannot reach adequate power to replicate a proposed association at a specified level of statistical significance when k studies are performed (regardless of how large these studies are). Based on empirical evidence from 91 proposed gene–disease associations (50 on candidate genes and 41 from genome-wide association efforts), the observed between-study heterogeneity is often close to or even surpasses nonreplicability thresholds. With more modest between-study heterogeneity, the required sample size increases considerably compared with when no between-study heterogeneity exists. Increases are steep as τ02 is approached. Therefore, some true associations may not be practically possible to replicate with consistency, no matter how large studies are conducted. Efforts should be made to minimize between-study heterogeneity in targeted genetic effects.

[1]  M. McCarthy,et al.  Replication of Genome-Wide Association Signals in UK Samples Reveals Risk Loci for Type 2 Diabetes , 2007, Science.

[2]  A. Gylfason,et al.  A Common Variant on Chromosome 9p21 Affects the Risk of Myocardial Infarction , 2007, Science.

[3]  John P. A. Ioannidis,et al.  The Emergence of Networks in Human Genome Epidemiology: Challenges and Opportunities , 2007, Epidemiology.

[4]  J. Fleiss Review papers : The statistical basis of meta-analysis , 1993 .

[5]  J. Ioannidis,et al.  Relative Citation Impact of Various Study Designs in the Health Sciences , 2005, JAMA.

[6]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[7]  K. Bussell Signalling: Friendly rivalry , 2005, Nature Reviews Molecular Cell Biology.

[8]  D. Altman,et al.  Statistical heterogeneity in systematic reviews of clinical trials: a critical appraisal of guidelines and practice , 2002, Journal of health services research & policy.

[9]  Christopher H Schmid,et al.  Summing up evidence: one answer is not always enough , 1998, The Lancet.

[10]  David Altshuler,et al.  Once and again-issues surrounding replication in genetic association studies. , 2002, The Journal of clinical endocrinology and metabolism.

[11]  Bruno D. Zumbo,et al.  A note on misconceptions concerning prospective and retrospective power , 1998 .

[12]  Chiara Sabatti,et al.  Magnitude and distribution of linkage disequilibrium in population isolates and implications for genome-wide association studies , 2006, Nature Genetics.

[13]  J. Sterne,et al.  The publication process itself was the major cause of publication bias in genetic epidemiology. , 2006, Journal of clinical epidemiology.

[14]  Thomas A Trikalinos,et al.  Family-Based versus Unrelated Case-Control Designs for Genetic Associations , 2006, PLoS genetics.

[15]  M. Jarvelin,et al.  A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity , 2007, Science.

[16]  Thomas A Trikalinos,et al.  Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. , 2006, American journal of epidemiology.

[17]  G A Colditz,et al.  Understanding research synthesis (meta-analysis). , 1996, Annual review of public health.

[18]  S Greenland,et al.  Tests for interaction in epidemiologic studies: a review and a study of power. , 1983, Statistics in medicine.

[19]  J. Hirschhorn,et al.  A comprehensive review of genetic association studies , 2002, Genetics in Medicine.

[20]  Thomas A Trikalinos,et al.  'Racial' differences in genetic effects for complex diseases , 2004, Nature Genetics.

[21]  P. Donnelly,et al.  Replicating genotype–phenotype associations , 2007, Nature.

[22]  D. Hunter Gene–environment interactions in human diseases , 2005, Nature Reviews Genetics.

[23]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[24]  J. Ioannidis Genetic and molecular epidemiology , 2007, Journal of Epidemiology and Community Health.

[25]  Marcia M. Nizzari,et al.  Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels , 2007, Science.

[26]  David R. Jones,et al.  Methods for Exploring Heterogeneity in Meta-Analysis , 2001 .

[27]  S. Thompson,et al.  How should meta‐regression analyses be undertaken and interpreted? , 2002, Statistics in medicine.

[28]  G. Abecasis,et al.  A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants , 2007, Science.

[29]  Larry V Hedges,et al.  The power of statistical tests for moderators in meta-analysis. , 2004, Psychological methods.

[30]  J. Ioannidis,et al.  Local Literature Bias in Genetic Epidemiology: An Empirical Evaluation of the Chinese Literature , 2005, PLoS medicine.

[31]  J. Ioannidis,et al.  Replication validity of genetic association studies , 2001, Nature Genetics.

[32]  Thomas A Trikalinos,et al.  Genetic associations in large versus small studies: an empirical assessment , 2003, The Lancet.

[33]  John P.A. Ioannidis,et al.  Non-Replication and Inconsistency in the Genome-Wide Association Setting , 2007, Human Heredity.

[34]  Evangelos Evangelou,et al.  Heterogeneity in Meta-Analyses of Genome-Wide Association Investigations , 2007, PloS one.

[35]  N. Laird,et al.  Meta-analysis in clinical trials. , 1986, Controlled clinical trials.

[36]  L. Cardon,et al.  Association study designs for complex diseases , 2001, Nature Reviews Genetics.

[37]  P. Taberlet,et al.  Genotyping errors: causes, consequences and solutions , 2005, Nature Reviews Genetics.

[38]  Nikolaos A Patsopoulos,et al.  Claims of sex differences: an empirical assessment in genetic associations. , 2007, JAMA.

[39]  John P A Ioannidis,et al.  Genetic associations: false or true? , 2003, Trends in molecular medicine.

[40]  L. Hedges,et al.  The power of statistical tests in meta-analysis. , 2001, Psychological methods.

[41]  J. Ioannidis,et al.  Quantitative Synthesis in Systematic Reviews , 1997, Annals of Internal Medicine.

[42]  F. Collins,et al.  Merging and emerging cohorts: Necessary but not sufficient , 2007, Nature.

[43]  M. Munafo,et al.  Meta-analysis of genetic association studies. , 2004, Trends in genetics : TIG.

[44]  D. Allison,et al.  Nonreplication in genetic association studies of obesity and diabetes research. , 2003, The Journal of nutrition.

[45]  D. Clayton,et al.  Population structure, differential bias and genomic control in a large-scale, case-control association study , 2005, Nature Genetics.