Effect of Population Stratification on Case-Control Association Studies

Objectives: This is the first of two articles discussing the effect of population stratification on the type I error rate (i.e., false positive rate). This paper focuses on the confounding risk ratio (CRR). It is accepted that population stratification (PS) can produce false positive results in case-control genetic association. However, which values of population parameters lead to an increase in type I error rate is unknown. Some believe PS does not represent a serious concern [1, 2], whereas others believe that PS may contribute to contradictory findings in genetic association [3]. We used computer simulations to estimate the effect of PS on type I error rate over a wide range of disease frequencies and marker allele frequencies, and we compared the observed type I error rate to the magnitude of the confounding risk ratio. Methods: We simulated two populations and mixed them to produce a combined population, specifying 160 different combinations of input parameters (disease prevalences and marker allele frequencies in the two populations). From the combined populations, we selected 5000 case-control datasets, each with either 50, 100, or 300 cases and controls, and determined the type I error rate. In all simulations, the marker allele and disease were independent (i.e., no association). Results: The type I error rate is not substantially affected by changes in the disease prevalence per se. We found that the CRR provides a relatively poor indicator of the magnitude of the increase in type I error rate. We also derived a simple mathematical quantity, Δ, that is highly correlated with the type I error rate. In the companion article (part II, in this issue) [4], we extend this work to multiple subpopulations and unequal sampling proportions. Conclusion: Based on these results, realistic combinations of disease prevalences and marker allele frequencies can substantially increase the probability of finding false evidence of marker disease associations. Furthermore, the CRR does not indicate when this will occur.

[1]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[2]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[3]  P. Holgate A mathematical study of the founder principle of evolutionary genetics , 1966, Journal of Applied Probability.

[4]  J. Oxford,et al.  Oxford , 1968, Leaving The Arena.

[5]  M. Nei Molecular Evolutionary Genetics , 1987 .

[6]  C. Falk,et al.  Haplotype relative risks: an easy reliable way to construct a proper control sample for risk calculations , 1987, Annals of human genetics.

[7]  R. Williams,et al.  Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. , 1988, American journal of human genetics.

[8]  K K Kidd,et al.  Drift, admixture, and selection in human evolution: a study with DNA polymorphisms. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[9]  W. Ewens,et al.  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). , 1993, American journal of human genetics.

[10]  Jacob Cohen The earth is round (p < .05) , 1994 .

[11]  W. Ewens,et al.  The transmission/disequilibrium test: history, subdivision, and admixture. , 1995, American journal of human genetics.

[12]  R. Todd,et al.  Genetic association between monoamine oxidase and manic-depressive illness: comparison of relative risk and haplotype relative risk data. , 1997, American journal of medical genetics.

[13]  N Risch,et al.  The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases I. DNA pooling. , 1998, Genome research.

[14]  A. Jemal,et al.  Global cancer statistics , 2011, CA: a cancer journal for clinicians.

[15]  M. Spence,et al.  Simulated data for a complex genetic trait (Problem 2 for GAW11): How the model was developed, and why , 1999, Genetic epidemiology.

[16]  J. Pritchard,et al.  Use of unlinked genetic markers to detect population stratification in association studies. , 1999, American journal of human genetics.

[17]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[18]  N. Risch Searching for genetic determinants in the new millennium , 2000, Nature.

[19]  S. Neuhausen Founder populations and their uses for breast cancer genetics , 2000, Breast Cancer Research.

[20]  William J. Blot,et al.  Atlas of Cancer Mortality in the United States 1950-94 , 2000 .

[21]  J. Witte,et al.  Association between a CYP 3 A 4 Genetic Variant and Clinical Presentation in African-American Prostate Cancer Patients 1 , 2000 .

[22]  N. Rothman,et al.  Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. , 2000, Journal of the National Cancer Institute.

[23]  P. Donnelly,et al.  Case-control studies of association in structured or admixed populations. , 2001, Theoretical population biology.

[24]  J. Ioannidis,et al.  Replication validity of genetic association studies , 2001, Nature Genetics.

[25]  L. Wasserman,et al.  Genomic control, a new approach to genetic-based association studies. , 2001, Theoretical population biology.

[26]  D. Reich,et al.  Detecting association in a case‐control study while correcting for population stratification , 2001, Genetic epidemiology.

[27]  Nathaniel Rothman,et al.  Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[28]  R. Myers,et al.  Candidate-gene approaches for studying complex genetic traits: practical considerations , 2002, Nature Reviews Genetics.

[29]  M. Silink,et al.  Childhood Diabetes: A Global Perspective , 2004, Hormone Research in Paediatrics.

[30]  John S Witte,et al.  Point: population stratification: a problem for case-control studies of candidate-gene associations? , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[31]  J. Hirschhorn,et al.  A comprehensive review of genetic association studies , 2002, Genetics in Medicine.

[32]  Robert C. Elston,et al.  Biostatistical genetics and genetic epidemiology , 2002 .

[33]  J. Knottnerus,et al.  Systematic Review and Meta‐analysis of Incidence Studies of Epilepsy and Unprovoked Seizures , 2002, Epilepsia.

[34]  Thomas A Trikalinos,et al.  Genetic associations in large versus small studies: an empirical assessment , 2003, The Lancet.

[35]  L. Cardon,et al.  Population stratification and spurious allelic association , 2003, The Lancet.

[36]  P. McKeigue,et al.  Problems of reporting genetic associations with complex outcomes , 2003, The Lancet.

[37]  Kei-Hoi Cheung,et al.  ALFRED: the ALelle FREquency Database. Update , 2003, Nucleic Acids Res..

[38]  Susan E Hodge,et al.  The emperor's new methods. , 2003, American journal of human genetics.

[39]  R. Ackerman,et al.  Interleukin-1 Receptor Antagonist Gene Polymorphisms in Carotid Atherosclerosis , 2003, Stroke.

[40]  S. Zhang,et al.  Qualitative Semi‐Parametric Test for Genetic Associations in Case‐Control Designs Under Structured Populations , 2003, Annals of human genetics.

[41]  D. Allison,et al.  Nonreplication in genetic association studies of obesity and diabetes research. , 2003, The Journal of nutrition.

[42]  I. Andrulis,et al.  Genetic Variants of GPX1 and SOD2 and Breast Cancer Risk at the Ontario Site of the Breast Cancer Family Registry , 2004, Cancer Epidemiology Biomarkers & Prevention.

[43]  S. Gabriel,et al.  Assessing the impact of population stratification on genetic association studies , 2004, Nature Genetics.

[44]  Susan E. Hodge,et al.  Effect of Population Stratification on Case-Control Association Studies , 2004, Human Heredity.

[45]  Case-Control Association Studies in Mixed Populations: Correcting Using Genomic Control , 2005, Human Heredity.

[46]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.

[47]  D. Schaid Disease-Marker Association , 2005 .