Turning publicly available gene expression data into discoveries using gene set context analysis

Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data.

[1]  A. Murray,et al.  Skeletal muscle energy metabolism in environmental hypoxia: climbing towards consensus , 2014, Extreme Physiology & Medicine.

[2]  A. Schurr Cerebral glycolysis: a century of persistent misunderstanding and misconception , 2014, Front. Neurosci..

[3]  Rebecca A. Ihrie,et al.  Sonic hedgehog signaling in the postnatal brain. , 2014, Seminars in cell & developmental biology.

[4]  L. Zender,et al.  Activation and repression by oncogenic MYC shape tumour-specific gene expression profiles , 2014, Nature.

[5]  L. Rui,et al.  Energy metabolism in the liver. , 2014, Comprehensive Physiology.

[6]  Nuno A. Fonseca,et al.  Expression Atlas update—a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments , 2013, Nucleic Acids Res..

[7]  Matthew N. McCall,et al.  The Gene Expression Barcode 3.0: improved data processing and mining tools , 2013, Nucleic Acids Res..

[8]  Paul Bertone,et al.  Identification of the missing pluripotency mediator downstream of leukaemia inhibitory factor , 2013, The EMBO journal.

[9]  C. Dang MYC, metabolism, cell growth, and tumorigenesis. , 2013, Cold Spring Harbor perspectives in medicine.

[10]  Hongkai Ji,et al.  ChIPXpress: using publicly available gene expression data to improve ChIP-seq and ChIP-chip target gene ranking , 2013, BMC Bioinformatics.

[11]  K. Larsson,et al.  MYC inhibition induces metabolic changes leading to accumulation of lipid droplets in tumor cells , 2013, Proceedings of the National Academy of Sciences.

[12]  Matthew N. McCall,et al.  ChIP-PED enhances the analysis of ChIP-seq and ChIP-chip data , 2013, Bioinform..

[13]  P. Pandolfi,et al.  A PML–PPAR-δ pathway for fatty acid oxidation regulates hematopoietic stem cell maintenance , 2012, Nature Medicine.

[14]  G. Smyth,et al.  Camera: a competitive gene set test accounting for inter-gene correlation , 2012, Nucleic acids research.

[15]  Stefan M. Pfister,et al.  The clinical implications of medulloblastoma subgroups , 2012, Nature Reviews Neurology.

[16]  Chi V Dang,et al.  MYC on the Path to Cancer , 2012, Cell.

[17]  Hongkai Ji,et al.  Cell-Type Independent MYC Target Genes Reveal a Primordial Signature Involved in Biomass Accumulation , 2011, PloS one.

[18]  Casey S. Greene,et al.  PILGRM: an interactive data-driven discovery platform for expert biologists , 2011, Nucleic Acids Res..

[19]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[20]  A. Ruiz i Altaba,et al.  NANOG regulates glioma stem cells and is essential in vivo acting in a cross‐functional network with GLI1 and p53 , 2010, The EMBO journal.

[21]  Mauro Biffoni,et al.  Hedgehog controls neural stem cells through p53‐independent regulation of Nanog , 2010, The EMBO journal.

[22]  Di Wu,et al.  ROAST: rotation gene set tests for complex microarray experiments , 2010, Bioinform..

[23]  Hongkai Ji,et al.  Hedgehog pathway-regulated gene networks in cerebellum development and tumorigenesis , 2010, Proceedings of the National Academy of Sciences.

[24]  Chun-Chi Liu,et al.  Bayesian approach to transforming public gene expression repositories into disease diagnosis databases , 2010, Proceedings of the National Academy of Sciences.

[25]  H. Parkinson,et al.  A global map of human gene expression , 2010, Nature Biotechnology.

[26]  Rafael A Irizarry,et al.  Frozen robust multiarray analysis (fRMA). , 2010, Biostatistics.

[27]  K. Kaluarachchi,et al.  Pharmacologic inhibition of fatty acid oxidation sensitizes human leukemia cells to apoptosis induction. , 2010, The Journal of clinical investigation.

[28]  Renata C. Geer,et al.  The NCBI BioSystems database , 2009, Nucleic Acids Res..

[29]  Min Xu,et al.  Automated multidimensional phenotypic profiling using large public microarray repositories , 2009, Proceedings of the National Academy of Sciences.

[30]  Matthew A. Hibbs,et al.  Exploring the human genome with functional maps. , 2009, Genome research.

[31]  Hongkai Ji,et al.  A genome-scale analysis of the cis-regulatory circuitry underlying sonic hedgehog-mediated patterning of the mammalian limb. , 2008, Genes & development.

[32]  Jiang Qian,et al.  TiGER: A database for tissue-specific gene expression and regulation , 2008, BMC Bioinformatics.

[33]  R. Irizarry,et al.  A gene expression bar code for microarray data , 2007, Nature Methods.

[34]  Kai Li,et al.  Exploring the functional landscape of gene expression: directed search of large microarray compendia , 2007, Bioinform..

[35]  Toyoaki Tenzen,et al.  The Hedgehog-binding proteins Gas1 and Cdo cooperate to positively regulate Shh signaling during mouse development. , 2007, Genes & development.

[36]  Yi Xing,et al.  Exon arrays provide accurate assessments of gene expression , 2007, Genome Biology.

[37]  M. Henriksson,et al.  The Myc oncoprotein as a therapeutic target for human cancer. , 2006, Seminars in cancer biology.

[38]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[40]  J. Miyazaki,et al.  Quantitative expression of Oct-3/4 defines differentiation, dedifferentiation or self-renewal of ES cells , 2000, Nature Genetics.

[41]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[42]  R. Fisher FREQUENCY DISTRIBUTION OF THE VALUES OF THE CORRELATION COEFFIENTS IN SAMPLES FROM AN INDEFINITELY LARGE POPU;ATION , 1915 .

[43]  A. Butte,et al.  Creation and implications of a phenome-genome network , 2006, Nature Biotechnology.

[44]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[45]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..