Spurious Genetic Associations

BACKGROUND Genetic association studies are widely used in biomedical research and yet only a minority of positive findings stand the test of replication. I explored the capacity of association studies to produce false positive findings and the impact of various definitions of replication. METHODS Genetically realistic simulation data of a typical genotyping/analytic approach for 10 single nucleotide polymorphisms (SNPs) in COMT, a commonly studied candidate gene. RESULTS Candidate gene studies like those simulated here are highly likely to produce one or more false positive findings at alpha < or = .05, the pattern of findings can often be "compelling" or "intriguing," and false positive findings propagate and confuse the literature unless the definition of replication is precise. CONCLUSIONS Findings from single association studies constitute "tentative knowledge" and must be interpreted with exceptional caution. For the association method to function as intended, every statistical comparison must be tracked and reported, and integrated replication is essential. Precise replication (the same SNPs, phenotype, and direction of association) is required in the interpretation of multiple association studies.

[1]  Bradley M. Hemminger,et al.  TAMAL: an integrated approach to choosing SNPs for genetic studies of human complex traits , 2006, Bioinform..

[2]  J. Hirschhorn,et al.  A comprehensive review of genetic association studies , 2002, Genetics in Medicine.

[3]  S. Gabriel,et al.  Efficiency and power in genetic association studies , 2005, Nature Genetics.

[4]  P. McKeigue,et al.  Problems of reporting genetic associations with complex outcomes , 2003, The Lancet.

[5]  P. Sullivan,et al.  A Framework for Controlling False Discovery Rates and Minimizing the Amount of Genotyping in the Search for Disease Mutations , 2004, Human Heredity.

[6]  I. Gottesman,et al.  The endophenotype concept in psychiatry: etymology and strategic intentions. , 2003, The American journal of psychiatry.

[7]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[8]  David B Allison,et al.  "Are we there yet?": Deciding when one has demonstrated specific genetic causation in complex diseases and quantitative traits. , 2003, American journal of human genetics.

[9]  S. Fullerton,et al.  Dissecting complex disease: the quest for the Philosopher's Stone? , 2006, International journal of epidemiology.

[10]  J. Ioannidis Commentary: grading the credibility of molecular evidence for complex diseases. , 2006, International journal of epidemiology.

[11]  W James Gauderman,et al.  Sample size requirements for matched case‐control studies of gene–environment interaction , 2002, Statistics in medicine.

[12]  N. Risch Searching for genetic determinants in the new millennium , 2000, Nature.

[13]  Joseph P. Romano,et al.  Generalizations of the familywise error rate , 2005, math/0507420.

[14]  Patrick F Sullivan,et al.  False discoveries and models for gene discovery. , 2003, Trends in genetics : TIG.

[15]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[16]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[17]  Fred A. Wright,et al.  Genetics and population analysis Simulating association studies : a data-based resampling method for candidate regions or whole genome scans , 2007 .

[18]  P. Sullivan,et al.  Genetic case-control association studies in neuropsychiatry. , 2001, Archives of general psychiatry.

[19]  M. Khoury,et al.  Tracking the epidemiology of human genes in the literature: the HuGE Published Literature database. , 2006, American journal of epidemiology.

[20]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[21]  C. Weinberg,et al.  Reporting, appraising, and integrating data on genotype prevalence and gene-disease associations. , 2002, American journal of epidemiology.

[22]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[23]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[24]  M. Egan,et al.  Functional analysis of genetic variation in catechol-O-methyltransferase (COMT): effects on mRNA, protein, and enzyme activity in postmortem human brain. , 2004, American journal of human genetics.

[25]  Nathaniel Rothman,et al.  Assessing the Probability That a Positive Report is False: An Approach for Molecular Epidemiology Studies , 2004 .

[26]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[27]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[28]  L. Cardon,et al.  Association study designs for complex diseases , 2001, Nature Reviews Genetics.

[29]  E. Hawe,et al.  In search of genetic precision , 2003, The Lancet.

[30]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[31]  M. Degroot,et al.  Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[32]  J. Ioannidis,et al.  Replication validity of genetic association studies , 2001, Nature Genetics.

[33]  K. Lange,et al.  A Conditional Inference Framework for Extending the Transmission/Disequilibrium Test , 1998, Human Heredity.

[34]  Fei Zou,et al.  Assessing genomewide statistical significance in linkage studies , 2004, Genetic epidemiology.

[35]  Y. Benjamini,et al.  Controlling the false discovery rate in behavior genetics research , 2001, Behavioural Brain Research.

[36]  E. Lander,et al.  Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease , 2003, Nature Genetics.

[37]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[38]  P. Armitage Tests for Linear Trends in Proportions and Frequencies , 1955 .

[39]  D. Lin,et al.  Evaluating statistical significance in two-stage genomewide association studies. , 2006, American journal of human genetics.

[40]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.