A propensity score approach to correction for bias due to population stratification using genetic and non‐genetic factors

Confounding due to population stratification (PS) arises when differences in both allele and disease frequencies exist in a population of mixed racial/ethnic subpopulations. Genomic control, structured association, principal components analysis (PCA), and multidimensional scaling (MDS) approaches have been proposed to address this bias using genetic markers. However, confounding due to PS can also be due to non‐genetic factors. Propensity scores are widely used to address confounding in observational studies but have not been adapted to deal with PS in genetic association studies. We propose a genomic propensity score (GPS) approach to correct for bias due to PS that considers both genetic and non‐genetic factors. We compare the GPS method with PCA and MDS using simulation studies. Our results show that GPS can adequately adjust and consistently correct for bias due to PS. Under no/mild, moderate, and severe PS, GPS yielded estimated with bias close to 0 (mean=−0.0044, standard error=0.0087). Under moderate or severe PS, the GPS method consistently outperforms the PCA method in terms of bias, coverage probability (CP), and type I error. Under moderate PS, the GPS method consistently outperforms the MDS method in terms of CP. PCA maintains relatively high power compared to both MDS and GPS methods under the simulated situations. GPS and MDS are comparable in terms of statistical properties such as bias, type I error, and power. The GPS method provides a novel and robust tool for obtaining less‐biased estimates of genetic associations that can consider both genetic and non‐genetic factors. Genet. Epidemiol. 33:679–690, 2009. © 2009 Wiley‐Liss, Inc.

[1]  K. Roeder,et al.  The power of genomic control. , 2000, American journal of human genetics.

[2]  H. Deng,et al.  Comparison of Population-Based Association Study Methods Correcting for Population Stratification , 2008, PloS one.

[3]  J. Pritchard,et al.  Use of unlinked genetic markers to detect population stratification in association studies. , 1999, American journal of human genetics.

[4]  Mark D Shriver,et al.  Control of confounding of genetic associations in stratified populations. , 2003, American journal of human genetics.

[5]  S WRIGHT,et al.  Genetical structure of populations. , 1950, Nature.

[6]  Michael P Epstein,et al.  A simple and improved correction for population stratification in case-control studies. , 2007, American journal of human genetics.

[7]  Hua Tang,et al.  Categorization of humans in biomedical research: genes, race and disease , 2002, Genome Biology.

[8]  D. Allison,et al.  Review and Evaluation of Methods Correcting for Population Stratification with a Focus on Underlying Statistical Principles , 2008, Human Heredity.

[9]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[10]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.

[11]  L. Wasserman,et al.  Genomic control, a new approach to genetic-based association studies. , 2001, Theoretical population biology.

[12]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[13]  Jill S Barnholtz-Sloan,et al.  Ancestry Estimation and Correction for Population Stratification in Molecular Epidemiologic Association Studies , 2008, Cancer Epidemiology Biomarkers & Prevention.

[14]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[15]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[16]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[17]  Ondrej Libiger,et al.  Generalized Analysis of Molecular Variance , 2007, PLoS genetics.

[18]  T. Rebbeck,et al.  Evaluating bias due to population stratification in case‐control association studies of admixed populations , 2004, Genetic epidemiology.

[19]  H. Deng,et al.  Population admixture may appear to mask, change or reverse genetic effects of genes underlying complex traits. , 2001, Genetics.

[20]  T. Rebbeck,et al.  Evaluating Bias due to Population Stratification in Epidemiologic Studies of Gene-Gene or Gene-Environment Interactions , 2006, Cancer Epidemiology Biomarkers & Prevention.

[21]  Hui-Ju Tsai,et al.  Comparison of three methods to estimate genetic ancestry and control for stratification in genetic association studies among admixed populations , 2005, Human Genetics.

[22]  K. Roeder,et al.  Unbiased methods for population‐based association studies , 2001, Genetic epidemiology.

[23]  D. Rubin,et al.  Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score , 1985 .

[24]  M Soledad Cepeda,et al.  Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders. , 2003, American journal of epidemiology.

[25]  J. Gastwirth,et al.  Robust genomic control for association studies. , 2006, American journal of human genetics.

[26]  N. Risch,et al.  Population stratification confounds genetic association studies among Latinos , 2005, Human Genetics.

[27]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.

[28]  Kosuke Imai,et al.  Causal Inference With General Treatment Regimes , 2004 .

[29]  N. Rothman,et al.  Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. , 2000, Journal of the National Cancer Institute.

[30]  Qizhai Li,et al.  Improved correction for population stratification in genome‐wide association studies by identifying hidden population structures , 2008, Genetic epidemiology.

[31]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[32]  G A Satten,et al.  Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. , 2001, American journal of human genetics.

[33]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[34]  P. Donnelly,et al.  Case-control studies of association in structured or admixed populations. , 2001, Theoretical population biology.

[35]  P. Rosenbaum,et al.  Invited commentary: propensity scores. , 1999, American journal of epidemiology.

[36]  Nathaniel Rothman,et al.  Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[37]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.