Correction of Population Stratification in Large Multi-Ethnic Association Studies

Background The vast majority of genetic risk factors for complex diseases have, taken individually, a small effect on the end phenotype. Population-based association studies therefore need very large sample sizes to detect significant differences between affected and non-affected individuals. Including thousands of affected individuals in a study requires recruitment in numerous centers, possibly from different geographic regions. Unfortunately such a recruitment strategy is likely to complicate the study design and to generate concerns regarding population stratification. Methodology/Principal Findings We analyzed 9,751 individuals representing three main ethnic groups - Europeans, Arabs and South Asians - that had been enrolled from 154 centers involving 52 countries for a global case/control study of acute myocardial infarction. All individuals were genotyped at 103 candidate genes using 1,536 SNPs selected with a tagging strategy that captures most of the genetic diversity in different populations. We show that relying solely on self-reported ethnicity is not sufficient to exclude population stratification and we present additional methods to identify and correct for stratification. Conclusions/Significance Our results highlight the importance of carefully addressing population stratification and of carefully “cleaning” the sample prior to analyses to obtain stronger signals of association and to avoid spurious results.

[1]  Jonathan Scott Friedlaender,et al.  A Human Genome Diversity Cell Line Panel , 2002, Science.

[2]  Holly M. Mortensen,et al.  Convergent adaptation of human lactase persistence in Africa and Europe , 2007, Nature Genetics.

[3]  J. Pritchard,et al.  Confounding from Cryptic Relatedness in Case-Control Association Studies , 2005, PLoS genetics.

[4]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[5]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[6]  R. Williams,et al.  Diabetes mellitus in the Pima Indians: genetic and evolutionary considerations. , 1983, American journal of physical anthropology.

[7]  N. Risch,et al.  Population stratification confounds genetic association studies among Latinos , 2005, Human Genetics.

[8]  R. Chakraborty,et al.  Analyses of genetic structure of Tibeto-Burman populations reveals sex-biased admixture in southern Tibeto-Burmans. , 2004, American journal of human genetics.

[9]  J. Witte,et al.  Genetic dissection of complex traits , 1996, Nature Genetics.

[10]  S. Gabriel,et al.  Assessing the impact of population stratification on genetic association studies , 2004, Nature Genetics.

[11]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[12]  Xiaofeng Zhu,et al.  Genetic Structure, Self-identified Race/ethnicity, and Confounding in Case-control Association Studies , 2022 .

[13]  L. C. Rutledge,et al.  Genetic Data Analysis , 1991 .

[14]  Francis S. Collins,et al.  Genes, environment and the value of prospective cohort studies , 2006, Nature Reviews Genetics.

[15]  Elizabeth L. Ogburn,et al.  Demonstrating stratification in a European American population , 2005, Nature Genetics.

[16]  T. Hudson,et al.  Genetic analysis of 103 candidate genes for coronary artery disease and associated phenotypes in a founder population reveals a new association between endothelin-1 and high-density lipoprotein cholesterol. , 2007, American journal of human genetics.

[17]  Michael S. Blouin,et al.  DNA-based methods for pedigree reconstruction and kinship analysis in natural populations , 2003 .

[18]  A. Chakravarti,et al.  Differential Susceptibility to Hypertension Is Due to Selection during the Out-of-Africa Expansion , 2005, PLoS genetics.

[19]  S. Yusuf,et al.  Obesity and the risk of myocardial infarction in 27 000 participants from 52 countries: a case-control study , 2005, The Lancet.

[20]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.

[21]  S. Humphries,et al.  Familial hypercholesterolemia and coronary heart disease: a HuGE association review. , 2004, American journal of epidemiology.

[22]  K. Roeder,et al.  Unbiased methods for population‐based association studies , 2001, Genetic epidemiology.

[23]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[24]  P. Donnelly,et al.  Case-control studies of association in structured or admixed populations. , 2001, Theoretical population biology.

[25]  S. Yusuf,et al.  Association of psychosocial risk factors with risk of acute myocardial infarction in 11 119 cases and 13 648 controls from 52 countries (the INTERHEART study): case-control study , 2004, The Lancet.

[26]  L. Cardon,et al.  Population stratification and spurious allelic association , 2003, The Lancet.

[27]  David B. Witonsky,et al.  CYP3A variation and the evolution of salt-sensitivity variants. , 2004, American journal of human genetics.

[28]  E. Heyer,et al.  Variability of the genetic contribution of Quebec population founders associated to some deleterious genes. , 1995, American journal of human genetics.

[29]  D Bentley,et al.  Highly parallel SNP genotyping. , 2003, Cold Spring Harbor symposia on quantitative biology.

[30]  S. Pääbo,et al.  Evidence for gradients of human genetic diversity within and among continents. , 2004, Genome research.

[31]  S. Yusuf,et al.  Effect of potentially modifiable risk factors associated with myocardial infarction in 52 countries (the INTERHEART study): case-control study , 2004, The Lancet.

[32]  N. Saitou,et al.  Natural selection and population history in the human angiotensinogen gene (AGT): 736 complete AGT sequences in chromosomes from around the world. , 2004, American journal of human genetics.

[33]  Sarah Lewis,et al.  Genetic epidemiology and public health: hope, hype, and future prospects , 2005, The Lancet.

[34]  Aaron P. Wagner,et al.  ml‐relate: a computer program for maximum likelihood estimation of relatedness and relationship , 2006 .

[35]  Amanda B. Hepler,et al.  Genetic relatedness analysis: modern data and new challenges , 2006, Nature Reviews Genetics.

[36]  J. Witte,et al.  Genetic dissection of complex traits. , 1994, Nature genetics.

[37]  Nathaniel Rothman,et al.  Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[38]  C. Antunes,et al.  Color and genomic ancestry in Brazilians , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[39]  R. Mahley,et al.  Apolipoprotein E: far more than a lipid transport protein. , 2000, Annual review of genomics and human genetics.

[40]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[41]  N. Rothman,et al.  Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. , 2000, Journal of the National Cancer Institute.

[42]  R. Chakraborty,et al.  Mitochondrial DNA polymorphism reveals hidden heterogeneity within some Asian populations. , 1990, American journal of human genetics.

[43]  R. Hudson,et al.  An evolutionary framework for common diseases: the ancestral-susceptibility model. , 2005, Trends in genetics : TIG.

[44]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[45]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.