Testing for genetic associations in arbitrarily structured populations

We present a new statistical test of association between a trait and genetic markers, which we theoretically and practically prove to be robust to arbitrarily complex population structure. The statistical test involves a set of parameters that can be directly estimated from large-scale genotyping data, such as those measured in genome-wide association studies (GWAS). We also derive a new set of methodologies, called a 'genotype-conditional association test' (GCAT), shown to provide accurate association tests in populations with complex structures, manifested in both the genetic and non-genetic contributions to the trait. We demonstrate the proposed method on a large simulation study and on the Northern Finland Birth Cohort study. In the Finland study, we identify several new significant loci that other methods do not detect. Our proposed framework provides a substantially different approach to the problem from existing methods, such as the linear mixed-model and principal-component approaches.

[1]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[2]  B. Weir,et al.  ESTIMATING F‐STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE , 1984, Evolution; international journal of organic evolution.

[3]  J. Pritchard,et al.  Use of unlinked genetic markers to detect population stratification in association studies. , 1999, American journal of human genetics.

[4]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[5]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[6]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[7]  Xiaofeng Zhu,et al.  On a semiparametric test to detect associations between quantitative traits and candidate genes using unrelated individuals , 2003, Genetic epidemiology.

[8]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[9]  Lothar Reichel,et al.  Restarted block Lanczos bidiagonalization methods , 2007, Numerical Algorithms.

[10]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[11]  Richard A. Nichols,et al.  A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity , 2008, Genetica.

[12]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[13]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[14]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[15]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[16]  F. Dudbridge,et al.  Estimation of significance thresholds for genomewide association scans , 2008, Genetic epidemiology.

[17]  D. Strachan,et al.  LDL-cholesterol concentrations: a genome-wide association study , 2008, The Lancet.

[18]  William J. Astle,et al.  Population Structure and Cryptic Relatedness in Genetic Association Studies , 2009, 1010.4681.

[19]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[20]  Inês Barroso,et al.  Meta-Analysis of Genome-Wide Scans for Human Adult Stature Identifies Novel Loci and Associations with Measures of Skeletal Frame Size , 2009, PLoS genetics.

[21]  C. Hoggart,et al.  Genome-wide association analysis of metabolic traits in a birth cohort from a founder population , 2008, Nature Genetics.

[22]  Inês Barroso,et al.  Variants in MTNR1B influence fasting glucose levels , 2009, Nature Genetics.

[23]  K. Frazer,et al.  Human genetic variation and its contribution to complex traits , 2009, Nature Reviews Genetics.

[24]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[25]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[26]  Mark I McCarthy,et al.  Genomic inflation factors under polygenic inheritance , 2011, European Journal of Human Genetics.

[27]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[28]  Kai Wang,et al.  An Analytical Comparison of the Principal Component Method and the Mixed Effects Model for Association Studies in the Presence of Cryptic Relatedness and Population Stratification , 2013, Human Heredity.

[29]  田原 康玄,et al.  生活習慣病とgenome-wide association study , 2015 .

[30]  Wei Hao,et al.  Probabilistic models of genetic variation in structured populations applied to global human studies , 2013, Bioinform..