Prior knowledge guided eQTL mapping for identifying candidate genes

BackgroundExpression quantitative trait loci (eQTL) mapping is often used to identify genetic loci and candidate genes correlated with traits. Although usually a group of genes affect complex traits, genes in most eQTL mapping methods are considered as independent. Recently, some eQTL mapping methods have accounted for correlated genes, used biological prior knowledge and applied these in model species such as yeast or mouse. However, biological prior knowledge might be very limited for most species.ResultsWe proposed a data-driven prior knowledge guided eQTL mapping for identifying candidate genes. At first, quantitative trait loci (QTL) analysis was used to identify single nucleotide polymorphisms (SNP) markers that are associated with traits. Then co-expressed gene modules were generated and gene modules significantly associated with traits were selected. Prior knowledge from QTL mapping was used for eQTL mapping on the selected modules. We tested and compared prior knowledge guided eQTL mapping to the eQTL mapping with no prior knowledge in a simulation study and two barley stem rust resistance case studies.The results in simulation study and real barley case studies show that models using prior knowledge outperform models without prior knowledge. In the first case study, three gene modules were selected and one of the gene modules was enriched with defense response Gene Ontology (GO) terms. Also, one probe in the gene module is mapped to Rpg1, previously identified as resistance gene to stem rust. In the second case study, four gene modules are identified, one gene module is significantly enriched with defense response to fungus and bacterium.ConclusionsPrior knowledge guided eQTL mapping is an effective method for identifying candidate genes. The case studies in stem rust show that this approach is robust, and outperforms methods with no prior knowledge in identifying candidate genes.

[1]  E. Xing,et al.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network , 2009, PLoS genetics.

[2]  Pietro Liò,et al.  The BioMart community portal: an innovative alternative to large, centralized data repositories , 2015, Nucleic Acids Res..

[3]  Robert W. Williams,et al.  Exploiting regulatory variation to identify genes underlying quantitative resistance to the wheat stem rust pathogen Puccinia graminis f. sp. tritici in barley , 2008, Theoretical and Applied Genetics.

[4]  Xi Chen,et al.  Smoothing proximal gradient method for general structured sparse regression , 2010, The Annals of Applied Statistics.

[5]  Claudia Angelini,et al.  Time-course analysis of genome-wide gene expression data from hormone-responsive human breast cancer cells , 2008, BMC Bioinformatics.

[6]  Hao Wu,et al.  R/qtl: QTL Mapping in Experimental Crosses , 2003, Bioinform..

[7]  Wei Cheng,et al.  Graph-regularized dual Lasso for robust eQTL mapping , 2014, Bioinform..

[8]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[9]  M. Moscou,et al.  Quantitative and Qualitative Stem Rust Resistance Factors in Barley Are Associated with Transcriptional Suppression of Defense Regulons , 2011, PLoS genetics.

[10]  Pornpimol Charoentong,et al.  Data integration and exploration for the identification of molecular mechanisms in tumor-immune cells interaction , 2010, BMC Genomics.

[11]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[12]  Roger P Wise,et al.  Towards systems genetic analyses in barley: Integration of phenotypic, expression and genotype data into GeneNetwork , 2008, BMC Genetics.

[13]  Dan M. Bolser,et al.  OP-PCPJ140183 1..11 , 2015 .

[14]  Nils Rostoks,et al.  The barley stem rust-resistance gene Rpg1 is a novel disease-resistance gene with homology to receptor kinases , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[15]  W. Fung,et al.  Maximum likelihood estimates of two-locus recombination fractions under some natural inequality restrictions , 2008, BMC Genetics.

[16]  Rachel B. Brem,et al.  Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks , 2008, Nature Genetics.

[17]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[18]  M. Brown,et al.  Whole-Exome Re-Sequencing in a Family Quartet Identifies POP1 Mutations As the Cause of a Novel Skeletal Dysplasia , 2011, PLoS genetics.

[19]  K. Schughart,et al.  Data-driven assessment of eQTL mapping methods , 2010, BMC Genomics.

[20]  Manolis Kellis,et al.  Common Genetic Variants Modulate Pathogen-Sensing Responses in Human Dendritic Cells , 2014, Science.

[21]  Gad Abraham,et al.  Scalable approaches for analysis of human genome-wide expression and genetic variation data , 2012 .

[22]  R. Doerge,et al.  Global eQTL Mapping Reveals the Complex Genetic Architecture of Transcript-Level Variation in Arabidopsis , 2007, Genetics.

[23]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[24]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[25]  M. Stephens,et al.  Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-data Imputation , 2022 .

[26]  Seunghak Lee,et al.  Adaptive Multi-Task Lasso: with Application to eQTL Detection , 2010, NIPS.

[27]  R. Brueggeman,et al.  Resistance to stem rust race TTKSK maps to the rpg4/Rpg5 complex of chromosome 5H of barley. , 2009, Phytopathology.

[28]  Xiaohui Chen,et al.  A Two-Graph Guided Multi-task Lasso Approach for eQTL Mapping , 2012, AISTATS.

[29]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[30]  Trevor Hastie,et al.  A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression , 2013, 1311.6529.