Response to Sul and Eskin

We thank Sul and Eskin (Mixed models can correct for population structure for genomic regions under selection. Nature Reviews Genetics 26 Feb 2013 (doi:10.1038/ nrg2813‐c1))1 for carefully examining and confirming the limitation of standard mixed model association methods that we identi‐ fied in our Progress article (New approaches to population stratification in genome‐wide association studies. Nature Reviews Genetics 11, 459–463 (2010))2 and for developing an interesting new way to address it. In our article2, we investigated the limits of mixed model methods by consider‐ ing an extreme simulation in which most markers had low population differentiation (FST = 0.01), but a small fraction of mark‐ ers were unusually differentiated (allele frequency difference = 0.6). We found that standard mixed model methods3 did not fully correct for population structure, but mixed models with principal component covariates4 did fully correct for population structure. We stated that “population struc‐ ture is a fixed effect, and spurious asso‐ ciations might result if it is modelled as a random effect based on overall covariance”. Sul and Eskin1 have confirmed that, in this extreme simulation, standard mixed model methods do not fully correct for population structure and that mixed mod‐ els with principal component covariates do fully correct for population structure. They also investigated a new approach, which is to use a mixed model using two kinship matrices: one computed using unusually differentiated markers identified by their spatial ancestry analysis (SPA) method5, and one computed using the remaining markers. They reported that this approach also fully corrects for population structure in this simulation. Thus, population strati‐ fication (a fixed effect in this simulation) can be addressed using random effects in a way that we had not previously considered: our review considered only mixed models with a single random effect based on overall covariance3,4,6–8 but did not consider mixed models with multiple random effects1. Another possibility, very similar to the Sul and Eskin1 approach, is to use a mixed model that uses two kinship matrices — one computed from principal component 1, and one computed using the remaining principal components; this approach is based on the natural decomposition of a kinship matrix into its principal components9. This would also fully correct for population structure in this extreme simulation, as Sul and Eskin1 showed that using a single kinship matrix computed from principal component 1 fully corrects for population structure. A broader question is whether the limita‐ tion of standard mixed model methods that arises in this extreme simulation is a major concern in empirical studies. In our article2, we stated that standard mixed model meth‐ ods are an appealing and simple approach and are sufficient to correct for stratification in many settings. Sul and Eskin1 indicated that the limitation we described did not arise in the Finnish and UK data sets that they analysed. We agree that mixed models with a single random effect based on overall covari‐ ance will probably be sufficient to correct for population structure fully in most settings. Finally, we note that recent work has raised additional points about mixed model methods, including inclusion versus exclu‐ sion of the candidate marker in the kinship matrix, use of only a small subset of mark‐ ers in computing the kinship matrix and effects of case–control ascertainment10–13. We believe that these are important points that merit further investigation, but this is outside the scope of the current Correspondence.

[1]  Eleazar Eskin,et al.  Mixed models can correct for population structure for genomic regions under selection , 2013, Nature Reviews Genetics.

[2]  Bjarni J. Vilhjálmsson,et al.  An efficient multi-locus mixed model approach for genome-wide association studies in structured populations , 2012, Nature Genetics.

[3]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[4]  J. Mefford,et al.  The Covariate's Dilemma , 2012, PLoS genetics.

[5]  Eleazar Eskin,et al.  Improved linear mixed models for genome-wide association studies , 2012, Nature Methods.

[6]  Eran Halperin,et al.  A model-based approach for analysis of spatial structure in genetic data , 2012, Nature Genetics.

[7]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[8]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[9]  Bjarni J. Vilhjálmsson,et al.  The nature of confounding in genome-wide association studies , 2012, Nature Reviews Genetics.

[10]  Gustavo de los Campos,et al.  Inferences from Genomic Models in Stratified Populations , 2012, Genetics.

[11]  Zhiwu Zhang,et al.  Mixed linear model approach adapted for genome-wide association studies , 2010, Nature Genetics.

[12]  Simon C. Potter,et al.  Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis , 2011, Nature.

[13]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.