An Arabidopsis Example of Association Mapping in Structured Samples

A potentially serious disadvantage of association mapping is the fact that marker-trait associations may arise from confounding population structure as well as from linkage to causative polymorphisms. Using genome-wide marker data, we have previously demonstrated that the problem can be severe in a global sample of 95 Arabidopsis thaliana accessions, and that established methods for controlling for population structure are generally insufficient. Here, we use the same sample together with a number of flowering-related phenotypes and data-perturbation simulations to evaluate a wider range of methods for controlling for population structure. We find that, in terms of reducing the false-positive rate while maintaining statistical power, a recently introduced mixed-model approach that takes genome-wide differences in relatedness into account via estimated pairwise kinship coefficients generally performs best. By combining the association results with results from linkage mapping in F2 crosses, we identify one previously known true positive and several promising new associations, but also demonstrate the existence of both false positives and false negatives. Our results illustrate the potential of genome-wide association scans as a tool for dissecting the genetics of natural variation, while at the same time highlighting the pitfalls. The importance of study design is clear; our study is severely under-powered both in terms of sample size and marker density. Our results also provide a striking demonstration of confounding by population structure. While statistical methods can be used to ameliorate this problem, they cannot always be effective and are certainly not a substitute for independent evidence, such as that obtained via crosses or transgenic experiments. Ultimately, association mapping is a powerful tool for identifying a list of candidates that is short enough to permit further genetic study.

[1]  O. Hardy,et al.  spagedi: a versatile computer program to analyse spatial genetic structure at the individual or population levels , 2002 .

[2]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[3]  M. Nordborg,et al.  Variation in the epigenetic silencing of FLC contributes to natural variation in Arabidopsis vernalization response. , 2006, Genes & development.

[4]  J. A. Jarillo,et al.  Regulation of flowering time by FVE, a retinoblastoma-associated protein , 2004, Nature Genetics.

[5]  M. Schmid,et al.  Diversity of Flowering Responses in Wild Arabidopsis thaliana Strains , 2005, PLoS genetics.

[6]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[7]  R. Amasino,et al.  Isolation of LUMINIDEPENDENS: a gene involved in the control of flowering time in Arabidopsis. , 1994, The Plant cell.

[8]  W. Peacock,et al.  The FLF MADS Box Gene: A Repressor of Flowering in Arabidopsis Regulated by Vernalization and Methylation , 1999, Plant Cell.

[9]  Mattias Jakobsson,et al.  The Pattern of Polymorphism in Arabidopsis thaliana , 2005, PLoS biology.

[10]  R. Amasino,et al.  FLOWERING LOCUS C Encodes a Novel MADS Domain Protein That Acts as a Repressor of Flowering , 1999, Plant Cell.

[11]  P. Oefner,et al.  The extent of linkage disequilibrium in Arabidopsis thaliana , 2002, Nature Genetics.

[12]  Keyan Zhao,et al.  Genome-Wide Association Mapping in Arabidopsis Identifies Previously Known Flowering Time and Pathogen Resistance Genes , 2005, PLoS genetics.

[13]  Keyan Zhao,et al.  Haplotype Structure and Phenotypic Associations in the Chromosomal Regions Surrounding Two Arabidopsis thaliana Flowering Time Loci Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AY781906, AY785055. , 2004, Genetics.

[14]  R. Amasino,et al.  Genetic interactions between FLM and other flowering-time genes in Arabidopsis thaliana , 2003, Plant Molecular Biology.

[15]  R. Amasino,et al.  Molecular analysis of FRIGIDA, a major determinant of natural variation in Arabidopsis flowering time. , 2000, Science.

[16]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[17]  D. Clayton,et al.  Population structure, differential bias and genomic control in a large-scale, case-control association study , 2005, Nature Genetics.

[18]  Simon Tavaré,et al.  Linkage disequilibrium: what history has to tell us. , 2002, Trends in genetics : TIG.

[19]  Kermit Ritland,et al.  Estimators for pairwise relatedness and individual inbreeding coefficients , 1996 .

[20]  J. Witte,et al.  Genetic dissection of complex traits. , 1994, Nature genetics.

[21]  Caroline Dean,et al.  Multiple Roles of Arabidopsis VRN1 in Vernalization and Flowering Time Control , 2002, Science.

[22]  Christine Stock,et al.  Interaction of Polycomb-group proteins controlling flowering in Arabidopsis , 2004, Development.

[23]  K. Konvička,et al.  Matching strategies for genetic association studies in structured populations. , 2004, American journal of human genetics.

[24]  M. Lynch,et al.  Estimation of pairwise relatedness with molecular markers. , 1999, Genetics.

[25]  Z. Zeng Precision mapping of quantitative trait loci. , 1994, Genetics.

[26]  M. Mni,et al.  Extensive genome-wide linkage disequilibrium in cattle. , 2000, Genome research.

[27]  Detlef Weigel,et al.  Quantitative trait locus mapping and DNA array hybridization identify an FLM deletion as a cause for natural flowering-time variation. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  N. Schork,et al.  Genome partitioning and whole-genome analysis. , 2001, Advances in genetics.

[29]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[30]  E. Meyerowitz,et al.  A Polycomb-group gene regulates homeotic gene expression in Arabidopsis , 1997, Nature.

[31]  M. Purugganan,et al.  Epistatic interaction between Arabidopsis FRI and FLC flowering time genes generates a latitudinal cline in a life history trait. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[32]  David J Balding,et al.  Logistic regression protects against population structure in genetic association studies. , 2005, Genome research.

[33]  Allison K. Wilson,et al.  The VERNALIZATION 2 Gene Mediates the Epigenetic Regulation of Vernalization in Arabidopsis , 2001, Cell.

[34]  R. Macknight,et al.  FCA , a Gene Controlling Flowering Time in Arabidopsis, Encodes a Protein Containing RNA-Binding Domains , 1997, Cell.

[35]  James A. Cuff,et al.  Genome sequence, comparative analysis and haplotype structure of the domestic dog , 2005, Nature.

[36]  J. Pritchard,et al.  Confounding from Cryptic Relatedness in Case-Control Association Studies , 2005, PLoS genetics.

[37]  R. Amasino,et al.  Fpa , a Gene Involved in Floral Induction in Arabidopsis, Encodes a Protein Containing Rna-recognition Motifs Overexpression of Fpa Results in Early-flowering in Short Days , 2001 .

[38]  J. Witte,et al.  Genetic dissection of complex traits , 1996, Nature Genetics.

[39]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[40]  Kathryn Roeder,et al.  Genomic Control to the extreme , 2004, Nature Genetics.

[41]  Richard M. Clark,et al.  The PHYTOCHROME C photoreceptor gene mediates natural variation in flowering and growth responses of Arabidopsis thaliana , 2006, Nature Genetics.

[42]  C. Lister,et al.  Analysis of the Molecular Basis of Flowering Time Variation in Arabidopsis Accessions1[w] , 2003, Plant Physiology.

[43]  Edward S. Buckler,et al.  Dwarf8 polymorphisms associate with variation in flowering time , 2001, Nature Genetics.

[44]  Birgir Hrafnkelsson,et al.  An Icelandic example of the impact of population structure on association studies , 2005, Nature Genetics.

[45]  M. Nordborg,et al.  Role of FRIGIDA and FLOWERING LOCUS C in Determining Variation in Flowering Time of Arabidopsis1[w] , 2005, Plant Physiology.

[46]  R. Amasino,et al.  Attenuation of FLOWERING LOCUS C activity as a mechanism for the evolution of summer-annual flowering behavior in Arabidopsis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[47]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[48]  V. Quesada,et al.  FY Is an RNA 3′ End-Processing Factor that Interacts with FCA to Control the Arabidopsis Floral Transition , 2003, Cell.

[49]  M. Purugganan,et al.  Linkage Disequilibrium Mapping of Arabidopsis CRY2 Flowering Time Alleles Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AY576055, AY576271. , 2004, Genetics.

[50]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.

[51]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.