Curses--winner's and otherwise--in genetic epidemiology.

The estimated effect of a marker allele from the initial study reporting the marker-allele association is often exaggerated relative to the estimated effect in follow-up studies (the "winner's curse" phenomenon). This is a particular concern for genome-wide association studies, where markers typically must pass very stringent significance thresholds to be selected for replication. A related problem is the overestimation of the predictive accuracy that occurs when the same data set is used to select a multilocus risk model from a wide range of possible models and then estimate the accuracy of the final model ("over-fitting"). Even in the absence of these quantitative biases, researchers can over-state the qualitative importance of their findings--for example, by focusing on relative risks in a context where sensitivity and specificity may be more appropriate measures. Epidemiologists need to be aware of these potential problems: as authors, to avoid or minimize them, and as readers, to detect them.

[1]  S Greenland,et al.  Concepts of interaction. , 1980, American journal of epidemiology.

[2]  W. Thompson,et al.  Effect modification and the limits of biological inference from epidemiologic data. , 1991, Journal of clinical epidemiology.

[3]  A. Motulsky Exploding the Gene Myth: How Genetic Information Is Produced and Manipulated by Scientists, Physicians, Employers, Insurance Companies, Educators, and Law Enforcers , 1995 .

[4]  N. Wald,et al.  When can a risk factor be used as a worthwhile screening test? , 1999, BMJ.

[5]  F. Collins,et al.  Shattuck lecture--medical and societal consequences of the Human Genome Project. , 1999, The New England journal of medicine.

[6]  D. Thomas Design of gene characterization studies: an overview. , 1999, Journal of the National Cancer Institute. Monographs.

[7]  David Altshuler,et al.  Once and again-issues surrounding replication in genetic association studies. , 2002, The Journal of clinical endocrinology and metabolism.

[8]  M. Khoury,et al.  Genomic profiling to promote a healthy lifestyle: not ready for prime time , 2003, Nature Genetics.

[9]  E. Lander,et al.  Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease , 2003, Nature Genetics.

[10]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[11]  Nathaniel Rothman,et al.  Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. , 2004, Journal of the National Cancer Institute.

[12]  Sarah Lewis,et al.  Genetic epidemiology and public health: hope, hype, and future prospects , 2005, The Lancet.

[13]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[14]  J. Ioannidis Microarrays and molecular research: noise discovery? , 2005, The Lancet.

[15]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[16]  J. Ware The limitations of risk factors as prognostic tools. , 2006, The New England journal of medicine.

[17]  Mark J Daly,et al.  Analysis of high-resolution HapMap of DTNBP1 (Dysbindin) suggests no consistency between reported common variant associations and schizophrenia. , 2006, American journal of human genetics.

[18]  Johanna M Seddon,et al.  Common variation in three genes, including a noncoding variant in CFH, strongly influences risk of age-related macular degeneration , 2006, Nature Genetics.

[19]  Qizhai Li,et al.  Flexible design for following up positive findings. , 2007, American journal of human genetics.

[20]  Oliver Sieber,et al.  A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21 , 2007, Nature Genetics.

[21]  Lester L. Peters,et al.  Genome-wide association study identifies novel breast cancer susceptibility loci , 2007, Nature.

[22]  P. Donnelly,et al.  Replicating genotype–phenotype associations , 2007, Nature.

[23]  J. Witte Multiple prostate cancer risk variants on 8q24 , 2007, Nature Genetics.

[24]  J. Ioannidis Why Most Discovered True Associations Are Inflated , 2008, Epidemiology.

[25]  R. Prentice,et al.  Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. , 2008, Biostatistics.

[26]  W. G. Hill,et al.  Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits , 2008, PLoS genetics.

[27]  C. Gieger,et al.  Identification of ten loci associated with height highlights new biological pathways in human growth , 2008, Nature Genetics.

[28]  Francis S Collins,et al.  A HapMap harvest of insights into the genetics of common disease. , 2008, The Journal of clinical investigation.

[29]  M. McCarthy,et al.  Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes , 2008, Nature Genetics.

[30]  Ali Amin Al Olama,et al.  Multiple newly identified loci associated with prostate cancer susceptibility , 2008, Nature Genetics.

[31]  Julian Peto,et al.  Association of Genetic Variants at 8q24 with Breast Cancer Risk , 2008, Cancer Epidemiology Biomarkers & Prevention.

[32]  Olle Melander,et al.  Polymorphisms associated with cholesterol and risk of cardiovascular events. , 2008, The New England journal of medicine.

[33]  Subhajyoti De,et al.  Common variants near MC4R are associated with fat mass, weight and risk of obesity , 2008, Nature Genetics.

[34]  Muin J Khoury,et al.  A critical appraisal of the scientific basis of commercial genomic profiles used to assess health risks and personalize health interventions. , 2008, American journal of human genetics.

[35]  Muin J. Khoury,et al.  Letting the genome out of the bottle--will we get our wish? , 2008, The New England journal of medicine.

[36]  W. Willett,et al.  Multiple loci identified in a genome-wide association study of prostate cancer , 2008, Nature Genetics.