Population Substructure and Control Selection in Genome-Wide Association Studies

Determination of the relevance of both demanding classical epidemiologic criteria for control selection and robust handling of population stratification (PS) represents a major challenge in the design and analysis of genome-wide association studies (GWAS). Empirical data from two GWAS in European Americans of the Cancer Genetic Markers of Susceptibility (CGEMS) project were used to evaluate the impact of PS in studies with different control selection strategies. In each of the two original case-control studies nested in corresponding prospective cohorts, a minor confounding effect due to PS (inflation factor λ of 1.025 and 1.005) was observed. In contrast, when the control groups were exchanged to mimic a cost-effective but theoretically less desirable control selection strategy, the confounding effects were larger (λ of 1.090 and 1.062). A panel of 12,898 autosomal SNPs common to both the Illumina and Affymetrix commercial platforms and with low local background linkage disequilibrium (pair-wise r 2<0.004) was selected to infer population substructure with principal component analysis. A novel permutation procedure was developed for the correction of PS that identified a smaller set of principal components and achieved a better control of type I error (to λ of 1.032 and 1.006, respectively) than currently used methods. The overlap between sets of SNPs in the bottom 5% of p-values based on the new test and the test without PS correction was about 80%, with the majority of discordant SNPs having both ranks close to the threshold. Thus, for the CGEMS GWAS of prostate and breast cancer conducted in European Americans, PS does not appear to be a major problem in well-designed studies. A study using suboptimal controls can have acceptable type I error when an effective strategy for the correction of PS is employed.

[1]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.

[2]  James D. Brooks,et al.  Multiple loci identified in a genome-wide association study of prostate cancer , 2008 .

[3]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[4]  G A Satten,et al.  Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. , 2001, American journal of human genetics.

[5]  S. Gabriel,et al.  Assessing the impact of population stratification on genetic association studies , 2004, Nature Genetics.

[6]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[7]  N. Risch,et al.  A comparison of linkage disequilibrium measures for fine-scale mapping. , 1995, Genomics.

[8]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[9]  L. Cardon,et al.  Population stratification and spurious allelic association , 2003, The Lancet.

[10]  P. Hartge,et al.  Joint effect of genes and environment distorted by selection biases: implications for hospital-based case-control studies. , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[11]  Nathaniel Rothman,et al.  Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[12]  J. McGeary,et al.  Population stratification in the candidate gene study: fatal threat or red herring? , 2004, Psychological bulletin.

[13]  P. Fearnhead,et al.  Genome-wide association study of prostate cancer identifies a second risk locus at 8q24 , 2007, Nature Genetics.

[14]  J K McLaughlin,et al.  Selection of controls in case-control studies. I. Principles. , 1992, American journal of epidemiology.

[15]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[16]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[17]  Michael P Epstein,et al.  A simple and improved correction for population stratification in case-control studies. , 2007, American journal of human genetics.

[18]  R. N. Hoover,et al.  Scanning the horizon: What is the future of genome-wide association studies in accelerating discoveries in cancer etiology and prevention? , 2007, Cancer Causes & Control.

[19]  W. Willett,et al.  A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer , 2007, Nature Genetics.

[20]  Debra T. Silverman,et al.  Selection of controls in case-control studies. II. Types of controls. , 1992, American journal of epidemiology.

[21]  David Reich,et al.  Discerning the Ancestry of European Americans in Genetic Association Studies , 2007, PLoS genetics.

[22]  Elizabeth L. Ogburn,et al.  Demonstrating stratification in a European American population , 2005, Nature Genetics.

[23]  John S Witte,et al.  Point: population stratification: a problem for case-control studies of candidate-gene associations? , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[24]  P. McCullagh,et al.  Some aspects of analysis of covariance. , 1982, Biometrics.

[25]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[26]  L Sun,et al.  Statistical tests for detection of misspecified relationships by use of genome-screen data. , 2000, American journal of human genetics.

[27]  J. Long,et al.  Information on ancestry from genetic markers , 2004, Genetic epidemiology.

[28]  D. Reich,et al.  Detecting association in a case‐control study while correcting for population stratification , 2001, Genetic epidemiology.

[29]  Xiaofeng Zhu,et al.  Association mapping, using a mixture model for complex traits , 2002, Genetic epidemiology.

[30]  N. Rothman,et al.  Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. , 2000, Journal of the National Cancer Institute.

[31]  Qizhai Li,et al.  Improved correction for population stratification in genome‐wide association studies by identifying hidden population structures , 2008, Genetic epidemiology.

[32]  G. Abecasis,et al.  Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies , 2006, Nature Genetics.

[33]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[34]  Pablo Villoslada,et al.  Analysis and Application of European Genetic Substructure Using 300 K SNP Information , 2008, PLoS genetics.

[35]  Deborah A. Nickerson,et al.  Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans , 2003, Nature Genetics.

[36]  T. Rebbeck,et al.  Evaluating Bias due to Population Stratification in Epidemiologic Studies of Gene-Gene or Gene-Environment Interactions , 2006, Cancer Epidemiology Biomarkers & Prevention.

[37]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.