Sparse regression models for unraveling group and individual associations in eQTL mapping

BackgroundAs a promising tool for dissecting the genetic basis of common diseases, expression quantitative trait loci (eQTL) study has attracted increasing research interest. Traditional eQTL methods focus on testing the associations between individual single-nucleotide polymorphisms (SNPs) and gene expression traits. A major drawback of this approach is that it cannot model the joint effect of a set of SNPs on a set of genes, which may correspond to biological pathways.ResultsTo alleviate this limitation, in this paper, we propose geQTL, a sparse regression method that can detect both group-wise and individual associations between SNPs and expression traits. geQTL can also correct the effects of potential confounders. Our method employs computationally efficient technique, thus it is able to fulfill large scale studies. Moreover, our method can automatically infer the proper number of group-wise associations. We perform extensive experiments on both simulated datasets and yeast datasets to demonstrate the effectiveness and efficiency of the proposed method. The results show that geQTL can effectively detect both individual and group-wise signals and outperforms the state-of-the-arts by a large margin.ConclusionsThis paper well illustrates that decoupling individual and group-wise associations for association mapping is able to improve eQTL mapping accuracy, and inferring individual and group-wise associations.

[1]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Wei Cheng,et al.  Graph-regularized dual Lasso for robust eQTL mapping , 2014, Bioinform..

[4]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[5]  Rachel B. Brem,et al.  Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks , 2008, Nature Genetics.

[6]  Ivana V. Yang,et al.  Genetic analysis of complex traits in the emerging Collaborative Cross. , 2011, Genome research.

[7]  E. Lander Initial impact of the sequencing of the human genome , 2011, Nature.

[8]  Joshua T. Burdick,et al.  Mapping determinants of human gene expression by regional and genome-wide association , 2005, Nature.

[9]  Lin Wang,et al.  Accounting for non-genetic factors by low-rank representation and sparse regression for eQTL mapping , 2013, Bioinform..

[10]  Serge Batalov,et al.  Genomewide Association Analysis in Diverse Inbred Mice: Power and Population Structure , 2007, Genetics.

[11]  Eleazar Eskin,et al.  Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies , 2014, Genome Biology.

[12]  A. Beyer,et al.  Detection and interpretation of expression quantitative trait loci (eQTL). , 2009, Methods.

[13]  John D. Storey,et al.  Genetic interactions between polymorphisms that affect gene expression in yeast , 2005, Nature.

[14]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  Hao Wu,et al.  R/qtl: QTL Mapping in Experimental Crosses , 2003, Bioinform..

[17]  David Heckerman,et al.  A powerful and efficient set test for genetic markers that handles confounders , 2012, Bioinform..

[18]  K. Buetow,et al.  Pathways of Distinction Analysis: A New Technique for Multi–SNP Analysis of GWAS Data , 2010, PLoS genetics.

[19]  D. Allison,et al.  Detection of gene x gene interactions in genome-wide association studies of human population data. , 2007, Human heredity.

[20]  Neil D. Lawrence,et al.  Joint Modelling of Confounding Factors and Prominent Genetic Regulators Provides Increased Accuracy in Genetical Genomics Studies , 2012, PLoS Comput. Biol..

[21]  Wei Cheng,et al.  Inferring novel associations between SNP sets and gene sets in eQTL study using sparse graphical model , 2012, BCB.

[22]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[23]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[24]  Marit Holden,et al.  GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies , 2008, Bioinform..

[25]  B. Bochner Innovations: New technologies to assess genotype–phenotype relationships , 2003, Nature Reviews Genetics.

[26]  K. Gunsalus,et al.  Network modeling links breast cancer susceptibility and centrosome dysfunction. , 2007, Nature genetics.

[27]  Rachel B. Brem,et al.  Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors , 2003, Nature Genetics.

[28]  David Heckerman,et al.  Correction for hidden confounders in the genetic analysis of gene expression , 2010, Proceedings of the National Academy of Sciences.

[29]  L. Kruglyak,et al.  Gene–Environment Interaction in Yeast Gene Expression , 2008, PLoS biology.

[30]  Xiaohui Chen,et al.  A Two-Graph Guided Multi-task Lasso Approach for eQTL Mapping , 2012, AISTATS.