Conditions Under Which Genome-Wide Association Studies Will be Positively Misleading

Genome-wide association mapping is a popular method for using natural variation within a species to generate a genotype–phenotype map. Statistical association between an allele at a locus and the trait in question is used as evidence that variation at the locus is responsible for variation of the trait. Indirect association, however, can give rise to statistically significant results at loci unrelated to the trait. We use a haploid, three-locus, binary genetic model to describe the conditions under which these indirect associations become stronger than any of the causative associations in the organism—even to the point of representing the only associations present in the data. These indirect associations are the result of disequilibrium between multiple factors affecting a single trait. Epistasis and population structure can exacerbate the problem but are not required to create it. From a statistical point of view, indirect associations are true associations rather than the result of stochastic noise: they will not be ameliorated by increasing sampling size or marker density and can be reproduced in independent studies.

[1]  C C Li,et al.  Population subdivision with respect to multiple alleles , 1969, Annals of human genetics.

[2]  C. Haley,et al.  Maximum likelihood mapping of quantitative trait loci using full-sib families. , 1992, Genetics.

[3]  L. Eaves Effect of genetic architecture on the power of human linkage studies to resolve the contribution of quantitative trait loci , 1994, Heredity.

[4]  J. Witte,et al.  Genetic dissection of complex traits , 1996, Nature Genetics.

[5]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[6]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.

[7]  R. N. Curnow,et al.  Estimating the locations and the sizes of the effects of quantitative trait loci using flanking markers , 1992, Theoretical and Applied Genetics.

[8]  T. Korves,et al.  A Novel Cost of R Gene Resistance in the Presence of Disease , 2004, The American Naturalist.

[9]  S. Gabriel,et al.  Assessing the impact of population stratification on genetic association studies , 2004, Nature Genetics.

[10]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[11]  Noah A. Rosenberg,et al.  A General Population-Genetic Model for the Production by Population Structure of Spurious Genotype–Phenotype Associations in Discrete, Admixed or Spatially Distributed Populations , 2006, Genetics.

[12]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[13]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[14]  Bjarni J. Vilhjálmsson,et al.  Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines , 2010 .

[15]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[16]  David B. Goldstein,et al.  Rare Variants Create Synthetic Genome-Wide Associations , 2010, PLoS biology.

[17]  Detlef Weigel,et al.  The Scale of Population Structure in Arabidopsis thaliana , 2010, PLoS genetics.