GAGE: generally applicable gene set enrichment for pathway analysis

BackgroundGene set analysis (GSA) is a widely used strategy for gene expression data analysis based on pathway knowledge. GSA focuses on sets of related genes and has established major advantages over individual gene analyses, including greater robustness, sensitivity and biological relevance. However, previous GSA methods have limited usage as they cannot handle datasets of different sample sizes or experimental designs.ResultsTo address these limitations, we present a new GSA method called Generally Applicable Gene-set Enrichment (GAGE). We successfully apply GAGE to multiple microarray datasets with different sample sizes, experimental designs and profiling techniques. GAGE shows significantly better results when compared to two other commonly used GSA methods of GSEA and PAGE. We demonstrate this improvement in the following three aspects: (1) consistency across repeated studies/experiments; (2) sensitivity and specificity; (3) biological relevance of the regulatory mechanisms inferred.GAGE reveals novel and relevant regulatory mechanisms from both published and previously unpublished microarray studies. From two published lung cancer data sets, GAGE derived a more cohesive and predictive mechanistic scheme underlying lung cancer progress and metastasis. For a previously unpublished BMP6 study, GAGE predicted novel regulatory mechanisms for BMP6 induced osteoblast differentiation, including the canonical BMP-TGF beta signaling, JAK-STAT signaling, Wnt signaling, and estrogen signaling pathways–all of which are supported by the experimental literature.ConclusionGAGE is generally applicable to gene expression datasets with different sample sizes and experimental designs. GAGE consistently outperformed two most frequently used GSA methods and inferred statistically and biologically more relevant regulatory pathways. The GAGE method is implemented in R in the "gage" package, available under the GNU GPL from http://sysbio.engin.umich.edu/~luow/downloads.php.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[3]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[4]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[5]  Ximing J. Yang,et al.  Detection of DNA copy number changes and oncogenic signaling abnormalities from gene expression data reveals MYC activation in high-grade papillary renal cell carcinoma. , 2007, Cancer research.

[6]  David S. Lapointe,et al.  Phenotype discovery by gene expression profiling: Mapping of biological processes linked to BMP‐2‐mediated osteoblast differentiation , 2003, Journal of cellular biochemistry.

[7]  Roland Baron,et al.  BMP‐2 Controls Alkaline Phosphatase Expression and Osteoblast Mineralization by a Wnt Autocrine Loop , 2003, Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research.

[8]  Korbinian Strimmer,et al.  A unified approach to false discovery rate estimation , 2008, BMC Bioinformatics.

[9]  G. Forrester,et al.  Robustness of the t and U tests under combined assumption violations , 1998 .

[10]  S. Kitazawa,et al.  Transcriptional Regulation of a BMP‐6 Promoter by Estrogen Receptor α , 2003, Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research.

[11]  Jonas Larsson,et al.  The role of Smad signaling in hematopoiesis , 2005, Oncogene.

[12]  I. Kohane,et al.  Absolute enrichment: gene set enrichment analysis for homeostatic systems , 2006, Nucleic acids research.

[13]  Daniel J. Vis,et al.  T-profiler: scoring the activity of predefined groups of genes using gene expression data , 2005, Nucleic Acids Res..

[14]  Ronald D.G. McKay,et al.  BMPs signal alternately through a SMAD or FRAP–STAT pathway to regulate fate choice in CNS stem cells , 2003, The Journal of cell biology.

[15]  L. Zhong,et al.  Plasma PGE-2 levels and altered cytokine profiles in adherent peripheral blood mononuclear cells in non-small cell lung cancer (NSCLC) , 2002, Molecular Cancer.

[16]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[17]  J. Darnell,et al.  Jak-STAT pathways and transcriptional activation in response to IFNs and other extracellular signaling proteins. , 1994, Science.

[18]  Z. Trajanoski,et al.  Gene expression profiling of human mesenchymal stem cells derived from bone marrow during expansion and osteoblast differentiation , 2007, BMC Genomics.

[19]  Paola Sebastiani,et al.  Early dysregulation of cell adhesion and extracellular matrix pathways in breast cancer progression. , 2009, The American journal of pathology.

[20]  Peter J. Woolf,et al.  Learning transcriptional regulatory networks from high throughput gene expression data using continuous three-way mutual information , 2008, BMC Bioinformatics.

[21]  Joseph D. Szustakowski,et al.  Extending the pathway analysis framework with a test for transcriptional variance implicates novel pathway modulation during myogenic differentiation , 2007, Bioinform..

[22]  S. Janz,et al.  Attenuation of WNT signaling by DKK-1 and -2 regulates BMP2-induced osteoblast differentiation and expression of OPG, RANKL and M-CSF , 2007, Molecular Cancer.

[23]  Joaquín Dopazo,et al.  From genes to functional classes in the study of biological systems , 2007, BMC Bioinformatics.

[24]  Joel Moss,et al.  Stimulation of Signal Transducer and Activator of Transcription-1 (STAT1)-dependent Gene Transcription by Lipopolysaccharide and Interferon-γ Is Regulated by Mammalian Target of Rapamycin* , 2003, Journal of Biological Chemistry.

[25]  Ioannis Karydis,et al.  Predictive value of D-dimer plasma levels in response and progressive disease in patients with lung cancer. , 2006, Lung cancer.

[26]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Randolph H Hastings,et al.  Parathyroid hormone‐related protein varies with sex and androgen status in nonsmall cell lung cancer , 2007, Cancer.

[28]  Seon-Young Kim,et al.  Gene-set approach for expression pattern analysis , 2008, Briefings Bioinform..

[29]  Lucas D. Ward,et al.  Dissecting complex transcriptional responses using pathway-level scores based on prior information , 2007, BMC Bioinformatics.

[30]  Qi Liu,et al.  BMC Bioinformatics BioMed Central Methodology article Comparative evaluation of gene-set analysis methods , 2007 .

[31]  Bing Zhang,et al.  WebGestalt: an integrated system for exploring gene sets in various biological contexts , 2005, Nucleic Acids Res..

[32]  Qi Liu,et al.  Improving gene set analysis of microarray data by SAM-GS , 2007, BMC Bioinformatics.

[33]  Zhen Jiang,et al.  Bioconductor Project Bioconductor Project Working Papers Year Paper Extensions to Gene Set Enrichment , 2013 .

[34]  J. Gutkind,et al.  G-protein-coupled receptors and cancer , 2007, Nature Reviews Cancer.

[35]  M. Newton,et al.  Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis , 2007, 0708.4350.

[36]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[37]  E. Tabakoğlu,et al.  High plasma D-dimer level is associated with decreased survival in patients with lung cancer. , 2007, Clinical oncology (Royal College of Radiologists (Great Britain)).

[38]  Andrew B. Nobel,et al.  Significance analysis of functional categories in gene expression studies: a structured permutation approach , 2005, Bioinform..

[39]  Horst Buerger,et al.  First evidence supporting a potential role for the BMP/SMAD pathway in the progression of oestrogen receptor‐positive breast cancer , 2005, The Journal of pathology.

[40]  Ruth Rimokh,et al.  Regulation of human erythropoiesis by activin A, BMP2, and BMP4, members of the TGFbeta family. , 2003, Experimental cell research.

[41]  R. Iman,et al.  Rank Transformations as a Bridge between Parametric and Nonparametric Statistics , 1981 .

[42]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[43]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[44]  Shuguang Huang,et al.  Overexpression of G protein-coupled receptors in cancer cells: involvement in tumor progression. , 2005, International journal of oncology.

[45]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Lambert C. J. Dorssers,et al.  GO-Mapper: functional analysis of gene expression data using the expression level as a score to evaluate Gene Ontology terms , 2004, Bioinform..

[47]  P. Puigserver,et al.  Resveratrol improves health and survival of mice on a high-calorie diet , 2006, Nature.