lrgpr: interactive linear mixed model analysis of genome-wide association studies with composite hypothesis testing and regression diagnostics in R

UNLABELLED The linear mixed model is the state-of-the-art method to account for the confounding effects of kinship and population structure in genome-wide association studies (GWAS). Current implementations test the effect of one or more genetic markers while including prespecified covariates such as sex. Here we develop an efficient implementation of the linear mixed model that allows composite hypothesis tests to consider genotype interactions with variables such as other genotypes, environment, sex or ancestry. Our R package, lrgpr, allows interactive model fitting and examination of regression diagnostics to facilitate exploratory data analysis in the context of the linear mixed model. By leveraging parallel and out-of-core computing for datasets too large to fit in main memory, lrgpr is applicable to large GWAS datasets and next-generation sequencing data. AVAILABILITY AND IMPLEMENTATION lrgpr is an R package available from lrgpr.r-forge.r-project.org.

[1]  P. Visscher,et al.  Advantages and pitfalls in the application of mixed-model association methods , 2014, Nature Genetics.

[2]  Stephen Weston,et al.  Scalable Strategies for Computing with Massive Data , 2013 .

[3]  Gabriel E. Hoffman,et al.  Correcting for Population Structure and Kinship Using the Linear Mixed Model: Theory and Extensions , 2013, PloS one.

[4]  Bjarni J. Vilhjálmsson,et al.  JAWAMix5: an out-of-core HDF5-based java implementation of whole-genome association studies using mixed models , 2013, Bioinform..

[5]  Tatiana I Axenovich,et al.  Rapid variance components–based method for whole-genome association analysis , 2012, Nature Genetics.

[6]  Eleazar Eskin,et al.  Improved linear mixed models for genome-wide association studies , 2012, Nature Methods.

[7]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[8]  David Heckerman,et al.  A powerful and efficient set test for genetic markers that handles confounders , 2012, Bioinform..

[9]  D. Heckerman,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[10]  Alkes L. Price,et al.  New approaches to population stratification in genome-wide association studies , 2010, Nature Reviews Genetics.

[11]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[12]  John Fox,et al.  Applied Regression Analysis and Generalized Linear Models , 2008 .

[13]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .