A General Population-Genetic Model for the Production by Population Structure of Spurious Genotype–Phenotype Associations in Discrete, Admixed or Spatially Distributed Populations

In linkage disequilibrium mapping of genetic variants causally associated with phenotypes, spurious associations can potentially be generated by any of a variety of types of population structure. However, mathematical theory of the production of spurious associations has largely been restricted to population structure models that involve the sampling of individuals from a collection of discrete subpopulations. Here, we introduce a general model of spurious association in structured populations, appropriate whether the population structure involves discrete groups, admixture among such groups, or continuous variation across space. Under the assumptions of the model, we find that a single common principle—applicable to both the discrete and admixed settings as well as to spatial populations—gives a necessary and sufficient condition for the occurrence of spurious associations. Using a mathematical connection between the discrete and admixed cases, we show that in admixed populations, spurious associations are less severe than in corresponding mixtures of discrete subpopulations, especially when the variance of admixture across individuals is small. This observation, together with the results of simulations that examine the relative influences of various model parameters, has important implications for the design and analysis of genetic association studies in structured populations.

[1]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  John S Witte,et al.  Point: population stratification: a problem for case-control studies of candidate-gene associations? , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[3]  Justin O Borevitz,et al.  The Impact of Genomics on the Study of Natural Variation in Arabidopsis , 2003, Plant Physiology.

[4]  N. Rothman,et al.  Population stratification in epidemiologic studies of common genetic variants and cancer: quantification of bias. , 2000, Journal of the National Cancer Institute.

[5]  Sohini Ramachandran,et al.  Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[6]  N. Risch Searching for genetic determinants in the new millennium , 2000, Nature.

[7]  Edward S. Buckler,et al.  Dwarf8 polymorphisms associate with variation in flowering time , 2001, Nature Genetics.

[8]  W. Ewens,et al.  The transmission/disequilibrium test: history, subdivision, and admixture. , 1995, American journal of human genetics.

[9]  P. Donnelly,et al.  Case-control studies of association in structured or admixed populations. , 2001, Theoretical population biology.

[10]  J. Witte,et al.  Genetic dissection of complex traits , 1996, Nature Genetics.

[11]  Elad Ziv,et al.  Human population structure and genetic association studies. , 2003, Pharmacogenomics.

[12]  D. Schaid Mathematical and Statistical Methods for Genetic Analysis , 1999 .

[13]  E. Lander,et al.  Genetic dissection of complex traits science , 1994 .

[14]  M. Shriver,et al.  Interrogating a high-density SNP map for signatures of natural selection. , 2002, Genome research.

[15]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[16]  E. Génin,et al.  Robustness of case-control studies of genetic factors to population stratification: magnitude of bias and type I error. , 2004, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[17]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[18]  J. Pearl,et al.  Confounding and Collapsibility in Causal Inference , 1999 .

[19]  J. Veyrieras,et al.  Maize Adaptation to Temperate Climate: Relationship Between Population Structure and Polymorphism in the Dwarf8 Gene , 2006, Genetics.

[20]  Elizabeth L. Ogburn,et al.  Demonstrating stratification in a European American population , 2005, Nature Genetics.

[21]  W. G. Hill,et al.  Measures of human population structure show heterogeneity among genomic regions. , 2005, Genome research.

[22]  Susan E. Hodge,et al.  Effect of Population Stratification on Case-Control Association Studies , 2004, Human Heredity.

[23]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.

[24]  M. Feldman,et al.  Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure , 2005, PLoS genetics.

[25]  K. Konvička,et al.  Matching strategies for genetic association studies in structured populations. , 2004, American journal of human genetics.

[26]  Terence P. Speed,et al.  Discussion on the meeting on ‘Statistical modelling and analysis of genetic data’ , 2002 .

[27]  P. Smouse,et al.  genalex 6: genetic analysis in Excel. Population genetic software for teaching and research , 2006 .

[28]  H. Bickeböller,et al.  Case‐Control Association Tests Correcting for Population Stratification , 2006, Annals of human genetics.

[29]  M Quinton,et al.  Estimation of effects of single genes on quantitative traits. , 1992, Journal of animal science.

[30]  J. Veyrieras,et al.  Maize adaptation to temperate climate : relationship with population structure and polymorphism in the Dwarf 8 gene , 2005 .

[31]  Junying Zhang,et al.  Effect of Population Stratification on Case-Control Association Studies , 2004, Human Heredity.

[32]  Nathaniel Rothman,et al.  Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. , 2002, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[33]  M. Purugganan,et al.  Linkage Disequilibrium Mapping of Arabidopsis CRY2 Flowering Time Alleles Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AY576055, AY576271. , 2004, Genetics.

[34]  M. Purugganan,et al.  Epistatic interaction between Arabidopsis FRI and FLC flowering time genes generates a latitudinal cline in a life history trait. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[35]  G. Tsujimoto,et al.  Free fatty acids regulate gut incretin glucagon-like peptide-1 secretion through GPR120 , 2005, Nature Medicine.

[36]  A. Clark,et al.  Finding genes underlying risk of complex disease by linkage disequilibrium mapping. , 2003, Current opinion in genetics & development.

[37]  M. Daly,et al.  Methods for high-density admixture mapping of disease genes. , 2004, American journal of human genetics.

[38]  N. Risch,et al.  Population stratification confounds genetic association studies among Latinos , 2005, Human Genetics.

[39]  J. Pritchard,et al.  Confounding from Cryptic Relatedness in Case-Control Association Studies , 2005, PLoS genetics.

[40]  Lon R. Cardon,et al.  The complex interplay among factors that influence allelic association , 2004, Nature Reviews Genetics.

[41]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[42]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[43]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[44]  John Doebley,et al.  Maize association population: a high-resolution platform for quantitative trait locus dissection. , 2005, The Plant journal : for cell and molecular biology.

[45]  David J Balding,et al.  Logistic regression protects against population structure in genetic association studies. , 2005, Genome research.

[46]  J. Pritchard,et al.  Use of unlinked genetic markers to detect population stratification in association studies. , 1999, American journal of human genetics.

[47]  Y. Yuval,et al.  Dominant inheritance in two families with familial Mediterranean fever (FMF). , 1995, American journal of medical genetics.

[48]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[49]  Mark D Shriver,et al.  Control of confounding of genetic associations in stratified populations. , 2003, American journal of human genetics.

[50]  Keyan Zhao,et al.  Genome-Wide Association Mapping in Arabidopsis Identifies Previously Known Flowering Time and Pathogen Resistance Genes , 2005, PLoS genetics.

[51]  Birgir Hrafnkelsson,et al.  An Icelandic example of the impact of population structure on association studies , 2005, Nature Genetics.

[52]  S. Gabriel,et al.  Assessing the impact of population stratification on genetic association studies , 2004, Nature Genetics.