Effect of Population Stratification on Case-Control Association Studies

There has been considerable debate in the literature concerning bias in case-control association mapping studies due to population stratification. In this paper, we perform a theoretical analysis of the effects of population stratification by measuring the inflation in the test’s type I error (or false-positive rate). Using a model of stratified sampling, we derive an exact expression for the type I error as a function of population parameters and sample size. We give necessary and sufficient conditions for the bias to vanish when there is no statistical association between disease and marker genotype in each of the subpopulations making up the total population. We also investigate the variation of bias with increasing subpopulations and show, both theoretically and by using simulations, that the bias can sometimes be quite substantial even with a very large number of subpopulations. In a companion simulation-based paper (Heiman et al., Part I, this issue), we have focused on the CRR (confounding risk ratio) and its relationship to the type I error in the case of two subpopulations, and have also quantified the magnitude of the type I error that can occur with relatively low CRR values.

[1]  Thomas A Trikalinos,et al.  Genetic associations in large versus small studies: an empirical assessment , 2003, The Lancet.

[2]  C. Falk,et al.  Haplotype relative risks: an easy reliable way to construct a proper control sample for risk calculations , 1987, Annals of human genetics.

[3]  D. Allison,et al.  Nonreplication in genetic association studies of obesity and diabetes research. , 2003, The Journal of nutrition.

[4]  L. Wasserman,et al.  Genomic control, a new approach to genetic-based association studies. , 2001, Theoretical population biology.

[5]  W. Ewens,et al.  Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). , 1993, American journal of human genetics.

[6]  N. Rothman,et al.  Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. , 2000, Journal of the National Cancer Institute.

[7]  W. Haenszel,et al.  Statistical aspects of the analysis of data from retrospective studies of disease. , 1959, Journal of the National Cancer Institute.

[8]  I. Andrulis,et al.  Genetic Variants of GPX1 and SOD2 and Breast Cancer Risk at the Ontario Site of the Breast Cancer Family Registry , 2004, Cancer Epidemiology Biomarkers & Prevention.

[9]  Francis S. Collins,et al.  Current concepts in genetics: principles of medical genetics (second of two parts). , 1976 .

[10]  R. Myers,et al.  Candidate-gene approaches for studying complex genetic traits: practical considerations , 2002, Nature Reviews Genetics.

[11]  N Risch,et al.  The relative power of family-based and case-control designs for linkage disequilibrium studies of complex human diseases I. DNA pooling. , 1998, Genome research.

[12]  D. Greenberg,et al.  Case-Control Association Studies in Mixed Populations: Correcting Using Genomic Control , 2005, Human Heredity.

[13]  K K Kidd,et al.  Drift, admixture, and selection in human evolution: a study with DNA polymorphisms. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[14]  J. Pritchard,et al.  Use of unlinked genetic markers to detect population stratification in association studies. , 1999, American journal of human genetics.

[15]  R. Williams,et al.  Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. , 1988, American journal of human genetics.

[16]  Kei-Hoi Cheung,et al.  ALFRED: the ALelle FREquency Database. Update , 2003, Nucleic Acids Res..

[17]  D. Reich,et al.  Detecting association in a case‐control study while correcting for population stratification , 2001, Genetic epidemiology.

[18]  Xiaolin Zhu,et al.  Qualitative Semi‐Parametric Test for Genetic Associations in Case‐Control Designs Under Structured Populations , 2003, Annals of human genetics.

[19]  John S Witte,et al.  Point: population stratification: a problem for case-control studies of candidate-gene associations? , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[20]  D. Schaid Disease-Marker Association , 2005 .

[21]  D. Hartl,et al.  Principles of population genetics , 1981 .

[22]  J. Knottnerus,et al.  Systematic Review and Meta‐analysis of Incidence Studies of Epilepsy and Unprovoked Seizures , 2002, Epilepsia.

[23]  P. Holgate A mathematical study of the founder principle of evolutionary genetics , 1966, Journal of Applied Probability.

[24]  M. Nei Molecular Evolutionary Genetics , 1987 .

[25]  W. Ewens,et al.  The transmission/disequilibrium test: history, subdivision, and admixture. , 1995, American journal of human genetics.

[26]  N. Risch Searching for genetic determinants in the new millennium , 2000, Nature.

[27]  M. Silink,et al.  Childhood Diabetes: A Global Perspective , 2004, Hormone Research in Paediatrics.

[28]  S. Neuhausen Founder populations and their uses for breast cancer genetics , 2000, Breast Cancer Research.

[29]  S. Gabriel,et al.  Assessing the impact of population stratification on genetic association studies , 2004, Nature Genetics.

[30]  A. Jemal,et al.  Global cancer statistics , 2011, CA: a cancer journal for clinicians.

[31]  Jacob Cohen The earth is round (p < .05) , 1994 .

[32]  M. Spence,et al.  Simulated data for a complex genetic trait (Problem 2 for GAW11): How the model was developed, and why , 1999, Genetic epidemiology.

[33]  Junying Zhang,et al.  Effect of Population Stratification on Case-Control Association Studies , 2004, Human Heredity.

[34]  Nathaniel Rothman,et al.  Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[35]  R. Todd,et al.  Genetic association between monoamine oxidase and manic-depressive illness: comparison of relative risk and haplotype relative risk data. , 1997, American journal of medical genetics.

[36]  J. Ioannidis,et al.  Replication validity of genetic association studies , 2001, Nature Genetics.

[37]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.

[38]  J. Witte,et al.  Association between a CYP3A4 genetic variant and clinical presentation in African-American prostate cancer patients. , 1999, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[39]  Robert C. Elston,et al.  Biostatistical genetics and genetic epidemiology , 2002 .

[40]  Susan E Hodge,et al.  The emperor's new methods. , 2003, American journal of human genetics.

[41]  William J. Blot,et al.  Atlas of Cancer Mortality in the United States 1950-94 , 2000 .

[42]  J. Hirschhorn,et al.  A comprehensive review of genetic association studies , 2002, Genetics in Medicine.

[43]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[44]  L. Cardon,et al.  Population stratification and spurious allelic association , 2003, The Lancet.

[45]  P. Donnelly,et al.  Case-control studies of association in structured or admixed populations. , 2001, Theoretical population biology.

[46]  P. McKeigue,et al.  Problems of reporting genetic associations with complex outcomes , 2003, The Lancet.

[47]  R. Ackerman,et al.  Interleukin-1 Receptor Antagonist Gene Polymorphisms in Carotid Atherosclerosis , 2003, Stroke.