Population Structure in Genetic Association Studies

Standard genetic association tests using case-control data are based on certain assumptions about the population from which study subjects were sampled. Two types of departure from these assumptions have been studied: population stratification and cryptic relatedness. Both types of departure have been called population structure. Each can lead to erroneous inferences due to differences between a test statistic’s actual null distribution and the nominal one valid only for populations without structure. The differences can reflect either confounding bias or variance distortion. For each type of structure, adjusted test statistics have been proposed whose actual null distributions, in the presence of the structure, equal the nominal ones appropriate for unstructured populations. This paper reviews models for population stratificationand cryptic relatedness, and uses them to examine the effects of each on the Armitage trend test for case-control data. Specifically, population stratification can cause confounding bias but not variance distortion, while cryptic relatedness can cause variance distortion but not confounding bias. Consequently the adjusted statistics developed for population stratification (e.g. the latent variable methods of Pritchard et al. (1999, 2001); Satten et al. (2001); Schork et al. (2001); Wang et al. (2005)), address potential confounding bias but not variance distortion. Conversely, the adjusted statistics developed for cryptic relatedness (e.g. the Genomic Control (GC) methods of Devlin and Roeder (1999), Setakis et al. (2006) and Zheng et al. (2006)) address variance distortion but not confounding bias. These differences may explain the anomalous behavior of adjusted statistics when applied to populations with structure of a type that differs from the one for which the method was designed. They indicate that care is needed to specify the nature of the underlying structure anticipated for a given population, and to use appropriate methods to adjust for it.

[1]  G A Satten,et al.  Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. , 2001, American journal of human genetics.

[2]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[3]  R. Williams,et al.  Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. , 1988, American journal of human genetics.

[4]  D. Reich,et al.  Detecting association in a case‐control study while correcting for population stratification , 2001, Genetic epidemiology.

[5]  Daniel Rabinowitz,et al.  A Unified Approach to Adjusting Association Tests for Population Admixture with Arbitrary Pedigree Structure and Arbitrary Missing Marker Information , 2000, Human Heredity.

[6]  L. Wasserman,et al.  Genomic control, a new approach to genetic-based association studies. , 2001, Theoretical population biology.

[7]  J. Gastwirth,et al.  Robust genomic control for association studies. , 2006, American journal of human genetics.

[8]  N. Breslow,et al.  Statistical methods in cancer research: volume 1- The analysis of case-control studies , 1980 .

[9]  J. Felsenstein Probability models and statistical methods in genetics , 1972 .

[10]  D. Clayton,et al.  Statistical Models in Epidemiology , 1993 .

[11]  Daniel Rabinowitz,et al.  Adjusting for Population Heterogeneity and Misspecified Haplotype Frequencies When Testing Nonparametric Null Hypotheses in Statistical Genetics , 2002 .

[12]  Elizabeth L. Ogburn,et al.  Demonstrating stratification in a European American population , 2005, Nature Genetics.

[13]  Jennifer L. Kelsey,et al.  Methods in Observational Epidemiology , 1986 .

[14]  Birgir Hrafnkelsson,et al.  An Icelandic example of the impact of population structure on association studies , 2005, Nature Genetics.

[15]  P. Sasieni From genotypes to genes: doubling the sample size. , 1997, Biometrics.

[16]  J. Pritchard,et al.  Confounding from Cryptic Relatedness in Case-Control Association Studies , 2005, PLoS genetics.

[17]  Prakash Gorroochurn,et al.  Centralizing the non‐central chi‐square: a new method to correct for population stratification in genetic case‐control association studies , 2006, Genetic epidemiology.

[18]  P. Donnelly,et al.  Case-control studies of association in structured or admixed populations. , 2001, Theoretical population biology.

[19]  A. Whittemore,et al.  Genetic association tests for family data with missing parental genotypes: A comparison , 2003, Genetic epidemiology.

[20]  D. Clayton,et al.  A generalization of the transmission/disequilibrium test for uncertain-haplotype transmission. , 1999, American journal of human genetics.

[21]  M. Kimura,et al.  An introduction to population genetics theory , 1971 .

[22]  David J Balding,et al.  Logistic regression protects against population structure in genetic association studies. , 2005, Genome research.

[23]  J. Pritchard,et al.  Use of unlinked genetic markers to detect population stratification in association studies. , 1999, American journal of human genetics.

[24]  N. Schork,et al.  The future of genetic case-control studies. , 2001, Advances in genetics.

[25]  N. Rothman,et al.  Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. , 2000, Journal of the National Cancer Institute.

[26]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.

[27]  T. Rebbeck,et al.  Bias Correction with a Single Null Marker for Population Stratification in Candidate Gene Association Studies , 2005, Human Heredity.

[28]  P. Armitage Tests for Linear Trends in Proportions and Frequencies , 1955 .

[29]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.