Multi-population GWA mapping via multi-task regularized regression

Motivation: Population heterogeneity through admixing of different founder populations can produce spurious associations in genome- wide association studies that are linked to the population structure rather than the phenotype. Since samples from the same population generally co-evolve, different populations may or may not share the same genetic underpinnings for the seemingly common phenotype. Our goal is to develop a unified framework for detecting causal genetic markers through a joint association analysis of multiple populations. Results: Based on a multi-task regression principle, we present a multi-population group lasso algorithm using L1/L2-regularized regression for joint association analysis of multiple populations that are stratified either via population survey or computational estimation. Our algorithm combines information from genetic markers across populations, to identify causal markers. It also implicitly accounts for correlations between the genetic markers, thus enabling better control over false positive rates. Joint analysis across populations enables the detection of weak associations common to all populations with greater power than in a separate analysis of each population. At the same time, the regression-based framework allows causal alleles that are unique to a subset of the populations to be correctly identified. We demonstrate the effectiveness of our method on HapMap-simulated and lactase persistence datasets, where we significantly outperform state of the art methods, with greater power for detecting weak associations and reduced spurious associations. Availability: Software will be available at http://www.sailing.cs.cmu.edu/ Contact: epxing@cs.cmu.edu

[1]  Holly M. Mortensen,et al.  Convergent adaptation of human lactase persistence in Africa and Europe , 2007, Nature Genetics.

[2]  Grace Wahba,et al.  Detecting disease-causing genes by LASSO-Patternsearch algorithm , 2007, BMC proceedings.

[3]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[4]  Xiaofeng Zhu,et al.  Association mapping, using a mixture model for complex traits , 2002, Genetic epidemiology.

[5]  Michael I. Jordan,et al.  A randomization test for controlling population stratification in whole-genome association studies. , 2007, American journal of human genetics.

[6]  Masashi Sugiyama,et al.  Dual-Augmented Lagrangian Method for Efficient Sparse Reconstruction , 2009, IEEE Signal Processing Letters.

[7]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[8]  Michael P Epstein,et al.  A simple and improved correction for population stratification in case-control studies. , 2007, American journal of human genetics.

[9]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[10]  Pak Sham,et al.  Properties of Structured Association Approaches to Detecting Population Stratification , 2005, Human Heredity.

[11]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[12]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[13]  Michael I. Jordan,et al.  High-dimensional union support recovery in multivariate regression , 2008, NIPS 2008.

[14]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[15]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[16]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[17]  P. Donnelly,et al.  Association mapping in structured populations. , 2000, American journal of human genetics.

[18]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[19]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[20]  P. Zhao,et al.  Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .

[21]  Pardis C Sabeti,et al.  Genetic signatures of strong recent positive selection at the lactase gene. , 2004, American journal of human genetics.

[22]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[23]  Kathryn Roeder,et al.  Genomic Control to the extreme , 2004, Nature Genetics.

[24]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[25]  M. Marazita,et al.  Genome-wide Association Studies , 2012, Journal of dental research.

[26]  Fei Zou,et al.  Comment on a simple and improved correction for population stratification. , 2008, American journal of human genetics.

[27]  N. Schork,et al.  Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. , 2008, American journal of human genetics.

[28]  Mark D Shriver,et al.  Control of confounding of genetic associations in stratified populations. , 2003, American journal of human genetics.

[29]  Leena Peltonen,et al.  Identification of a variant associated with adult-type hypolactasia , 2002, Nature Genetics.

[30]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[31]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[32]  G. Obozinski,et al.  High-dimensional union support recovery in multivariate regression , 2008 .

[33]  C. Hoggart,et al.  Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies , 2008, PLoS genetics.

[34]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[35]  E. Xing,et al.  mStruct: Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations , 2009, Genetics.

[36]  M. McMullen,et al.  A unified mixed-model method for association mapping that accounts for multiple levels of relatedness , 2006, Nature Genetics.

[37]  M. Stephens,et al.  Inferring weak population structure with the assistance of sample group information , 2009, Molecular ecology resources.

[38]  H. Boezen,et al.  Genome-wide association studies: what do they teach us about asthma and chronic obstructive pulmonary disease? , 2009, Proceedings of the American Thoracic Society.