Extracting local information for identifying differentially expressed pathways

This paper proposes to extract local information in a gene set for identifying differentially expressed (DE) gene pathways. DE pathways are more meaningful than a single DE gene to understanding a biological process, and identifying DE pathways has drawn more and more attentions recently. Current methods are mainly based on the identification of single DE genes, and do not concern correlations between genes. We propose to extract local correlations in a pathway of interest by randomly sampling multiple gene subsets from it and using a logistic regression model to measure how the local correlation pattern in each subset predicts phenotypic labels. The differential expression significance of the pathway is finally assessed by combining the p-values of the subsets predicting phenotypic labels to a combinative one. The proposed method referred to as locLR is evaluated on three simulation data sets and a real-world data, and is shown to be more powerful for identifying DE pathways than the previous methods.

[1]  Michael L. Gatza,et al.  A pathway-based classification of human breast cancer , 2010, Proceedings of the National Academy of Sciences.

[2]  Joaquín Dopazo,et al.  Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information , 2005, Bioinform..

[3]  Michael R. Kosorok,et al.  Identification of differential gene pathways with principal component analysis , 2009, Bioinform..

[4]  Zhen Jiang,et al.  Gene set enrichment analysis using linear models and diagnostics , 2008, Bioinform..

[5]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[6]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[7]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Qing Wang,et al.  Towards precise classification of cancers based on robust gene functional expression profiles , 2005, BMC Bioinformatics.

[9]  Rainer Breitling,et al.  Iterative Group Analysis (iGA): A simple tool to enhance sensitivity and facilitate interpretation of microarray experiments , 2004, BMC Bioinformatics.

[10]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[11]  Thomas P. Ryan,et al.  Modern Regression Methods , 1996 .

[12]  Peter J. Woolf,et al.  GAGE: generally applicable gene set enrichment for pathway analysis , 2009, BMC Bioinformatics.

[13]  Jun Lu,et al.  Pathway level analysis of gene expression using singular value decomposition , 2005, BMC Bioinformatics.

[14]  Dan Nettleton,et al.  Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis , 2008, Bioinform..

[15]  Hong-Qiang Wang,et al.  SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures , 2011, Bioinform..

[16]  X. Hu Generalized Linear Models , 2003 .

[17]  Mario Medvedovic,et al.  LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data , 2009, Bioinform..

[18]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[19]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[20]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[22]  Andrea Califano,et al.  Analysis of Gene Expression Microarrays for Phenotype Classification , 2000, ISMB.

[23]  Jaeyoung Kim,et al.  Identifying Biologically Significant Pathways by Gene Set Enrichment Analysis Using Fisher's Criterion , 2008, 2008 Second International Conference on Future Generation Communication and Networking.

[24]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[25]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[26]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.