Distinguishing true from false positives in genomic studies: p values

Distinguishing true from false positive findings is a major challenge in human genetic epidemiology. Several strategies have been devised to facilitate this, including the positive predictive value (PPV) and a set of epidemiological criteria, known as the “Venice” criteria. The PPV measures the probability of a true association, given a statistically significant finding, while the Venice criteria grade the credibility based on the amount of evidence, consistency of replication and protection from bias. A vast majority of journals use significance thresholds to identify the true positive findings. We studied the effect of p value thresholds on the PPV and used the PPV and Venice criteria to define usable thresholds of statistical significance. Theoretical and empirical analyses of data published on AlzGene show that at a nominal p value threshold of 0.05 most “positive” findings will turn out to be false if the prior probability of association is below 0.10 even if the statistical power of the study is higher than 0.80. However, in underpowered studies (0.25) with a low prior probability of 1 × 10−3, a p value of 1 × 10−5 yields a high PPV (>96 %). Here we have shown that the p value threshold of 1 × 10−5 gives a very strong evidence of association in almost all studies. However, in the case of a very high prior probability of association (0.50) a p value threshold of 0.05 may be sufficient, while for studies with very low prior probability of association (1 × 10−4; genome-wide association studies for instance) 1 × 10−7 may serve as a useful threshold to declare significance.

[1]  J. Ridley,et al.  An unexpected influence of widely used significance thresholds on the distribution of reported P‐values , 2007, Journal of evolutionary biology.

[2]  C. Duijn STROBE-ME too! , 2011, European Journal of Epidemiology.

[3]  Joseph F Lucke,et al.  A critique of the false‐positive report probability , 2009, Genetic epidemiology.

[4]  K. Rothman Epidemiology: An Introduction , 2002 .

[5]  D. Clayton,et al.  Betting odds and genetic associations. , 2004, Journal of the National Cancer Institute.

[6]  J. Hirschhorn,et al.  A comprehensive review of genetic association studies , 2002, Genetics in Medicine.

[7]  J. Ioannidis,et al.  The False-positive to False-negative Ratio in Epidemiologic Studies , 2011, Epidemiology.

[8]  Siobhan M. Dolan,et al.  Genome-Wide Association Studies, Field Synopses, and the Development of the Knowledge Base on Genetic Variation and Human Diseases , 2009, American journal of epidemiology.

[9]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[10]  John P A Ioannidis,et al.  What Should the Genome-wide Significance Threshold Be? Empirical Replication of Borderline Genetic Associations Yfor a Full List of Investigators Offering Data and Clarifications See Acknowledgments , 2022 .

[11]  C. V. van Duijn,et al.  STROBE-ME too! , 2011, European journal of epidemiology.

[12]  J. Brooks Why most published research findings are false: Ioannidis JP, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece , 2008 .

[13]  Christine B Ambrosone,et al.  SNPs, haplotypes, and cancer: applications in molecular epidemiology. , 2004, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[14]  Genetic association studies of complex neurological diseases , 2006, Journal of Neurology, Neurosurgery & Psychiatry.

[15]  Siobhan M. Dolan,et al.  Assessment of cumulative evidence on genetic associations: interim guidelines. , 2008, International journal of epidemiology.

[16]  J. Ioannidis,et al.  Evaluation of the potential excess of statistically significant findings in published genetic association studies: application to Alzheimer's disease. , 2008, American journal of epidemiology.

[17]  Alexander Gordon,et al.  Control of the mean number of false discoveries, Bonferroni and stability of multiple testing , 2007, 0709.0366.

[18]  Gudmundur A. Thorisson,et al.  The International HapMap Project Web site. , 2005, Genome research.

[19]  J. Haines,et al.  Effects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer disease. A meta-analysis. APOE and Alzheimer Disease Meta Analysis Consortium. , 1997, JAMA.

[20]  N. Laird,et al.  Meta-analysis in clinical trials. , 1986, Controlled clinical trials.

[21]  Roger M Harbord,et al.  A modified test for small‐study effects in meta‐analyses of controlled trials with binary endpoints , 2006, Statistics in medicine.

[22]  J. Haines,et al.  Effects of Age, Sex, and Ethnicity on the Association Between Apolipoprotein E Genotype and Alzheimer Disease: A Meta-analysis , 1997 .

[23]  Nathaniel Rothman,et al.  Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. , 2004, Journal of the National Cancer Institute.

[24]  M. Khoury,et al.  Most Published Research Findings Are False—But a Little Replication Goes a Long Way , 2007, PLoS medicine.

[25]  D. Altman,et al.  Measuring inconsistency in meta-analyses , 2003, BMJ : British Medical Journal.

[26]  J. Ioannidis Calibration of credibility of agnostic genome‐wide associations , 2008, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[27]  J. Kaiser Biotechnology. Researcher, two universities sued over validity of prostate cancer test. , 2009, Science.

[28]  Margaret A. Pericak-Vance,et al.  Effects of age, sex, and ethnicity on the association between apolipoprotein E genotype and Alzheimer disease , 1997 .

[29]  R. Weitkunat,et al.  Effectiveness of strategies to increase the validity of findings from association studies: size vs. replication , 2010, BMC medical research methodology.

[30]  Paolo Vineis,et al.  Gene-environment interactions: how many false positives? , 2005, Journal of the National Cancer Institute.

[31]  D. Blacker,et al.  Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database , 2007, Nature Genetics.

[32]  Gonçalo R. Abecasis,et al.  Fine Mapping of Five Loci Associated with Low-Density Lipoprotein Cholesterol Detects Variants That Double the Explained Heritability , 2011, PLoS genetics.