GSEA-InContext: identifying novel and common patterns in expression experiments

Motivation Gene Set Enrichment Analysis (GSEA) is routinely used to analyze and interpret coordinate pathway‐level changes in transcriptomics experiments. For an experiment where less than seven samples per condition are compared, GSEA employs a competitive null hypothesis to test significance. A gene set enrichment score is tested against a null distribution of enrichment scores generated from permuted gene sets, where genes are randomly selected from the input experiment. Looking across a variety of biological conditions, however, genes are not randomly distributed with many showing consistent patterns of up‐ or down‐regulation. As a result, common patterns of positively and negatively enriched gene sets are observed across experiments. Placing a single experiment into the context of a relevant set of background experiments allows us to identify both the common and experiment‐specific patterns of gene set enrichment. Results We compiled a compendium of 442 small molecule transcriptomic experiments and used GSEA to characterize common patterns of positively and negatively enriched gene sets. To identify experiment‐specific gene set enrichment, we developed the GSEA‐InContext method that accounts for gene expression patterns within a background set of experiments to identify statistically significantly enriched gene sets. We evaluated GSEA‐InContext on experiments using small molecules with known targets to show that it successfully prioritizes gene sets that are specific to each experiment, thus providing valuable insights that complement standard GSEA analysis. Availability and implementation GSEA‐InContext implemented in Python, Supplementary results and the background expression compendium are available at: https://github.com/CostelloLab/GSEA‐InContext.

[1]  Andrew B. Nobel,et al.  Significance analysis of functional categories in gene expression studies: a structured permutation approach , 2005, Bioinform..

[2]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[3]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[4]  V. Zachar,et al.  Temporal transcriptome of mouse ATDC5 chondroprogenitors differentiating under hypoxic conditions. , 2006, Experimental cell research.

[5]  J. Mesirov,et al.  The Molecular Signatures Database (MSigDB) hallmark gene set collection. , 2015, Cell systems.

[6]  Raymond E. Moellering,et al.  Direct inhibition of the NOTCH transcription factor complex , 2010, Nature.

[7]  K. Kihara,et al.  Glucocorticoids Suppress Tumor Angiogenesis and In vivo Growth of Prostate Cancer Cells , 2006, Clinical Cancer Research.

[8]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Doheon Lee,et al.  Inferring Pathway Activity toward Precise Disease Classification , 2008, PLoS Comput. Biol..

[10]  Jacob K. Asiedu,et al.  The Drug Repurposing Hub: a next-generation drug library and information resource , 2017, Nature Medicine.

[11]  Sean R. Davis,et al.  GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor , 2007, Bioinform..

[12]  David S. Wishart,et al.  MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data , 2010, Nucleic Acids Res..

[13]  R. Shoemaker The NCI60 human tumour cell line anticancer drug screen , 2006, Nature Reviews Cancer.

[14]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[15]  Jun Lu,et al.  Pathway level analysis of gene expression using singular value decomposition , 2005, BMC Bioinformatics.

[16]  U. Lendahl,et al.  Notch signaling mediates hypoxia-induced tumor cell migration and invasion , 2008, Proceedings of the National Academy of Sciences.

[17]  S. Conzen,et al.  GR and ER Coactivation Alters the Expression of Differentiation Genes and Associates with Improved ER+ Breast Cancer Outcome , 2016, Molecular Cancer Research.

[18]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[19]  N. Chandel,et al.  Reactive Oxygen Species Generated at Mitochondrial Complex III Stabilize Hypoxia-inducible Factor-1α during Hypoxia , 2000, The Journal of Biological Chemistry.

[20]  Angela N. Brooks,et al.  A Next Generation Connectivity Map: L1000 Platform And The First 1,000,000 Profiles , 2017 .

[21]  Aedín C. Culhane,et al.  GeneSigDB: a manually curated database and resource for analysis of gene expression signatures , 2011, Nucleic Acids Res..

[22]  Laura M. Heiser,et al.  Tumor-Derived Cell Lines as Molecular Models of Cancer Pharmacogenomics , 2015, Molecular Cancer Research.

[23]  M. Bass,et al.  Cyclin E2, a Novel G1 Cyclin That Binds Cdk2 and Is Aberrantly Expressed in Human Cancers , 1999, Molecular and Cellular Biology.

[24]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[25]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[26]  Justin Guinney,et al.  GSVA: gene set variation analysis for microarray and RNA-Seq data , 2013, BMC Bioinformatics.

[27]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[28]  Hiromitsu Araki,et al.  GeneSetDB: A comprehensive meta-database, statistical and visualisation framework for gene set analysis , 2012, FEBS open bio.

[29]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[30]  Z. Szallasi,et al.  Correction of technical bias in clinical microarray data improves concordance with known biological information , 2008, Genome Biology.

[31]  T. Heskes,et al.  The statistical properties of gene-set analysis , 2016, Nature Reviews Genetics.

[32]  Zhen Jiang,et al.  Bioconductor Project Bioconductor Project Working Papers Year Paper Extensions to Gene Set Enrichment , 2013 .

[33]  K. Whaley,et al.  Modulation of complement gene expression by glucocorticoids. , 1991, The Biochemical journal.

[34]  Wei Li Analyzing Gene Expression Data in Terms of Gene Sets: Gene Set Enrichment Analysis , 2009 .

[35]  D. Zheng,et al.  Glucocorticoid Receptor Confers Resistance to Antiandrogens by Bypassing Androgen Receptor Blockade , 2013, Cell.

[36]  U. Lendahl,et al.  Hypoxia requires notch signaling to maintain the undifferentiated cell state. , 2005, Developmental cell.

[37]  Huasheng Lu,et al.  Hypoxia-inducible Factor 1 Activation by Aerobic Glycolysis Implicates the Warburg Effect in Carcinogenesis* , 2002, The Journal of Biological Chemistry.

[38]  Ben S. Wittner,et al.  Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1 , 2009, Nature.

[39]  J. D. Engel,et al.  Genetic Evidence that Small Maf Proteins Are Essential for the Activation of Antioxidant Response Element-Dependent Genes , 2005, Molecular and Cellular Biology.

[40]  Kathleen M Jagodnik,et al.  Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd , 2016, Nature Communications.

[41]  B. Aggarwal,et al.  TNF-Induced Signaling in Apoptosis , 1999, Journal of Clinical Immunology.

[42]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[43]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Suhua Chang,et al.  i-GSEA4GWAS: a web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study , 2010, Nucleic Acids Res..

[45]  Daniel B. McClatchy,et al.  PSEA-Quant: A Protein Set Enrichment Analysis on Label-Free and Label-Based Protein Quantification Data , 2014, Journal of proteome research.

[46]  Karl F. MacDorman,et al.  PAGED: a pathway and gene-set enrichment database to enable molecular phenotype discoveries , 2012, BMC Bioinformatics.

[47]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[48]  Jihye Kim,et al.  DSigDB: drug signatures database for gene set analysis , 2015, Bioinform..

[49]  M. Baker Gene data to hit milestone , 2012, Nature.