FarmCPUpp: Efficient large‐scale genomewide association studies

Abstract Genomewide association studies (GWAS) are computationally demanding analyses that use large sample sizes and dense marker sets to discover associations between quantitative trait variation and genetic variants. FarmCPU is a powerful new method for performing GWAS. However, its performance is hampered by details of its implementation and its reliance on the R programming language. In this paper, we present an efficient implementation of FarmCPU, called FarmCPUpp, that retains the R user interface but improves memory management and speed through the use of C++ code and parallel computing.

[1]  Peter J. Diggle,et al.  Lgcp: Inference with spatial and spatio-temporal log-gaussian cox processes in R , 2013 .

[2]  Stephen Weston,et al.  Scalable Strategies for Computing with Massive Data , 2013 .

[3]  M. McMullen,et al.  Genetic Design and Statistical Power of Nested Association Mapping in Maize , 2008, Genetics.

[4]  P. Schnable,et al.  Distinct genetic architectures for phenotype means and plasticities in Zea mays , 2017, Nature Plants.

[5]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[6]  Edward S. Buckler,et al.  A SUPER Powerful Method for Genome Wide Association Study , 2014, PloS one.

[7]  S. O’Brien,et al.  SmileFinder: a resampling-based approach to evaluate signatures of selection from genome-wide sets of matching allele frequency data in two or more diploid populations , 2015, GigaScience.

[8]  Ying Liu,et al.  FaST linear mixed models for genome-wide association studies , 2011, Nature Methods.

[9]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[10]  Zhiwu Zhang,et al.  Mixed linear model approach adapted for genome-wide association studies , 2010, Nature Genetics.

[11]  Marie Wiberg,et al.  Performing the Kernel Method of Test Equating with the Package kequate , 2013 .

[12]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[13]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[14]  Bjarni J. Vilhjálmsson,et al.  An efficient multi-locus mixed model approach for genome-wide association studies in structured populations , 2012, Nature Genetics.

[15]  R. Bernardo,et al.  Prospects for genomewide selection for quantitative traits in maize , 2007 .

[16]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[17]  Zhiwu Zhang,et al.  Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies , 2016, PLoS genetics.

[18]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[19]  Douglas M. Bates,et al.  Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package , 2013 .

[20]  Romain Francois,et al.  Parallel Programming Tools for 'Rcpp' , 2016 .