Effect of Population Stratification on Case-Control Association Studies

Objectives: This is the first of two articles discussing the effect of population stratification on the type I error rate (i.e., false positive rate). This paper focuses on the confounding risk ratio (CRR). It is accepted that population stratification (PS) can produce false positive results in case-control genetic association. However, which values of population parameters lead to an increase in type I error rate is unknown. Some believe PS does not represent a serious concern [1, 2], whereas others believe that PS may contribute to contradictory findings in genetic association [3]. We used computer simulations to estimate the effect of PS on type I error rate over a wide range of disease frequencies and marker allele frequencies, and we compared the observed type I error rate to the magnitude of the confounding risk ratio. Methods: We simulated two populations and mixed them to produce a combined population, specifying 160 different combinations of input parameters (disease prevalences and marker allele frequencies in the two populations). From the combined populations, we selected 5000 case-control datasets, each with either 50, 100, or 300 cases and controls, and determined the type I error rate. In all simulations, the marker allele and disease were independent (i.e., no association). Results: The type I error rate is not substantially affected by changes in the disease prevalence per se. We found that the CRR provides a relatively poor indicator of the magnitude of the increase in type I error rate. We also derived a simple mathematical quantity, Δ, that is highly correlated with the type I error rate. In the companion article (part II, in this issue) [4], we extend this work to multiple subpopulations and unequal sampling proportions. Conclusion: Based on these results, realistic combinations of disease prevalences and marker allele frequencies can substantially increase the probability of finding false evidence of marker disease associations. Furthermore, the CRR does not indicate when this will occur.

[1]  R. Todd,et al.  Genetic association between monoamine oxidase and manic-depressive illness: comparison of relative risk and haplotype relative risk data. , 1997, American journal of medical genetics.

[2]  J. Ioannidis,et al.  Replication validity of genetic association studies , 2001, Nature Genetics.

[3]  P. McKeigue,et al.  Problems of reporting genetic associations with complex outcomes , 2003, The Lancet.

[4]  L. Cardon,et al.  Population stratification and spurious allelic association , 2003, The Lancet.

[5]  Jacob Cohen The earth is round (p < .05) , 1994 .

[6]  P. Donnelly,et al.  Case-control studies of association in structured or admixed populations. , 2001, Theoretical population biology.

[7]  P. Holgate A mathematical study of the founder principle of evolutionary genetics , 1966, Journal of Applied Probability.

[8]  J. Witte,et al.  Association between a CYP 3 A 4 Genetic Variant and Clinical Presentation in African-American Prostate Cancer Patients 1 , 2000 .

[9]  N. Risch Searching for genetic determinants in the new millennium , 2000, Nature.

[10]  William J. Blot,et al.  Atlas of Cancer Mortality in the United States 1950-94 , 2000 .

[11]  J. Hirschhorn,et al.  A comprehensive review of genetic association studies , 2002, Genetics in Medicine.

[12]  M. Silink,et al.  Childhood Diabetes: A Global Perspective , 2004, Hormone Research in Paediatrics.

[13]  Nathaniel Rothman,et al.  Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[14]  M. Spence,et al.  Simulated data for a complex genetic trait (Problem 2 for GAW11): How the model was developed, and why , 1999, Genetic epidemiology.

[15]  S. Neuhausen Founder populations and their uses for breast cancer genetics , 2000, Breast Cancer Research.

[16]  Xiaolin Zhu,et al.  Qualitative Semi‐Parametric Test for Genetic Associations in Case‐Control Designs Under Structured Populations , 2003, Annals of human genetics.

[17]  John S Witte,et al.  Point: population stratification: a problem for case-control studies of candidate-gene associations? , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[18]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[19]  A. Jemal,et al.  Global cancer statistics , 2011, CA: a cancer journal for clinicians.

[20]  R. Williams,et al.  Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. , 1988, American journal of human genetics.

[21]  Kei-Hoi Cheung,et al.  ALFRED: the ALelle FREquency Database. Update , 2003, Nucleic Acids Res..

[22]  D. Reich,et al.  Detecting association in a case‐control study while correcting for population stratification , 2001, Genetic epidemiology.

[23]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[24]  Thomas A Trikalinos,et al.  Genetic associations in large versus small studies: an empirical assessment , 2003, The Lancet.

[25]  D. Schaid Disease-Marker Association , 2005 .

[26]  J. Knottnerus,et al.  Systematic Review and Meta‐analysis of Incidence Studies of Epilepsy and Unprovoked Seizures , 2002, Epilepsia.

[27]  R. Ackerman,et al.  Interleukin-1 Receptor Antagonist Gene Polymorphisms in Carotid Atherosclerosis , 2003, Stroke.

[28]  D. Greenberg,et al.  Case-Control Association Studies in Mixed Populations: Correcting Using Genomic Control , 2005, Human Heredity.

[29]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[30]  N Risch,et al.  The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases I. DNA pooling. , 1998, Genome research.

[31]  K K Kidd,et al.  Drift, admixture, and selection in human evolution: a study with DNA polymorphisms. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[32]  D. Allison,et al.  Nonreplication in genetic association studies of obesity and diabetes research. , 2003, The Journal of nutrition.

[33]  L. Wasserman,et al.  Genomic control, a new approach to genetic-based association studies. , 2001, Theoretical population biology.

[34]  W. Ewens,et al.  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). , 1993, American journal of human genetics.

[35]  R. Myers,et al.  Candidate-gene approaches for studying complex genetic traits: practical considerations , 2002, Nature Reviews Genetics.

[36]  C. Falk,et al.  Haplotype relative risks: an easy reliable way to construct a proper control sample for risk calculations , 1987, Annals of human genetics.

[37]  S. Gabriel,et al.  Assessing the impact of population stratification on genetic association studies , 2004, Nature Genetics.

[38]  N. Rothman,et al.  Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. , 2000, Journal of the National Cancer Institute.

[39]  I. Andrulis,et al.  Genetic Variants of GPX1 and SOD2 and Breast Cancer Risk at the Ontario Site of the Breast Cancer Family Registry , 2004, Cancer Epidemiology Biomarkers & Prevention.

[40]  J. Pritchard,et al.  Use of unlinked genetic markers to detect population stratification in association studies. , 1999, American journal of human genetics.

[41]  M. Nei Molecular Evolutionary Genetics , 1987 .

[42]  W. Ewens,et al.  The transmission/disequilibrium test: history, subdivision, and admixture. , 1995, American journal of human genetics.

[43]  Robert C. Elston,et al.  Biostatistical genetics and genetic epidemiology , 2002 .

[44]  Susan E Hodge,et al.  The emperor's new methods. , 2003, American journal of human genetics.

[45]  Susan E. Hodge,et al.  Effect of Population Stratification on Case-Control Association Studies , 2004, Human Heredity.

[46]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.

[47]  J. Oxford,et al.  Oxford , 1968, Leaving The Arena.