Differential Expression Enrichment Tool (DEET): an interactive atlas of human differential gene expression

Differential gene expression analysis using RNA sequencing (RNA-seq) data is a standard approach for making biological discoveries. Ongoing large-scale efforts to process and normalize publicly available gene expression data enable rapid and systematic reanalysis. While several powerful tools systematically process RNA-seq data, enabling their reanalysis, few resources systematically recompute differentially expressed genes (DEGs) generated from individual studies. We developed a robust differential expression analysis pipeline to recompute 3162 human DEG lists from The Cancer Genome Atlas, Genotype-Tissue Expression Consortium, and 142 studies within the Sequence Read Archive. After measuring the accuracy of the recomputed DEG lists, we built the Differential Expression Enrichment Tool (DEET), which enables users to interact with the recomputed DEG lists. DEET, available through CRAN and RShiny, systematically queries which of the recomputed DEG lists share similar genes, pathways, and TF targets to their own gene lists. DEET identifies relevant studies based on shared results with the user’s gene lists, aiding in hypothesis generation and data-driven literature review. Highlights By curating metadata from uniformly processed human RNA-seq studies, we created a database of 3162 differential expression analyses. These analyses include TCGA, GTEx, and 142 unique studies in SRA, involving 985 distinct experimental conditions. The Differential Expression Enrichment Tool (DEET) allows users to systematically compare their gene lists to this database.

[1]  Daniel J. Blankenberg,et al.  GenomicSuperSignature facilitates interpretation of RNA-seq experiments through robust, efficient comparison to public databases , 2022, Nature Communications.

[2]  Shinya Oki,et al.  ChIP-Atlas 2021 update: a data-mining suite for exploring epigenomic landscapes by fully integrating ChIP-seq, ATAC-seq and Bisulfite-seq data , 2022, Nucleic Acids Res..

[3]  I. Rahman,et al.  An evaluation of RNA-seq differential analysis methods , 2022, bioRxiv.

[4]  K. Katz,et al.  The Sequence Read Archive: a decade more of explosive growth , 2021, Nucleic Acids Res..

[5]  OUP accepted manuscript , 2022, Nucleic Acids Research.

[6]  Shijie C. Zheng,et al.  recount3: summaries and queries for large-scale RNA-seq expression and splicing , 2021, Genome Biology.

[7]  Michael D. Wilson,et al.  Conserved regulatory logic at accessible and inaccessible chromatin during the acute inflammatory response in mammals , 2021, Nature Communications.

[8]  Irene Papatheodorou,et al.  From ArrayExpress to BioStudies , 2020, Nucleic Acids Res..

[9]  OUP accepted manuscript , 2021, Nucleic Acids Research.

[10]  Michael D. Wilson,et al.  Single-cell mapper (scMappR): using scRNA-seq to infer the cell-type specificities of differentially expressed genes , 2020, bioRxiv.

[11]  Peng Zhang,et al.  CCDC7 Activates Interleukin-6 and Vascular Endothelial Growth Factor to Promote Proliferation via the JAK-STAT3 Pathway in Cervical Cancer Cells , 2020, OncoTargets and therapy.

[12]  Abhijeet R. Sonawane,et al.  Sex Differences in Gene Expression and Regulatory Networks across 29 Human Tissues. , 2020, Cell reports.

[13]  Nuno A. Fonseca,et al.  Expression Atlas update: from tissues to single cells , 2019, Nucleic Acids Res..

[14]  C. Geisler,et al.  Tumor necrosis factor induces rapid down-regulation of TXNIP in human T cells , 2019, Scientific Reports.

[15]  R. Irizarry ggplot2 , 2019, Introduction to Data Science.

[16]  John A. Bohlin,et al.  Statistical predictions with glmnet , 2019, Clinical Epigenetics.

[17]  J. Hadfield,et al.  RNA sequencing: the teenage years , 2019, Nature Reviews Genetics.

[18]  Kieran R. Campbell,et al.  Dissociation of solid tumour tissues with cold active protease for single-cell RNA-seq minimizes conserved collagenase-associated stress responses , 2019, bioRxiv.

[19]  Lorne Zinman,et al.  The utility of multivariate outlier detection techniques for data quality evaluation in large studies: an application within the ONDRI project , 2019, BMC Medical Research Methodology.

[20]  Sean Davis,et al.  recount-brain: a curated repository of human brain RNA-seq datasets metadata , 2019, bioRxiv.

[21]  P. Pavlidis,et al.  Predictability of human differential gene expression , 2019, Proceedings of the National Academy of Sciences.

[22]  Gary D Bader,et al.  Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap , 2019, Nature Protocols.

[23]  Xiaoyan Zhang,et al.  Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis , 2018, Nucleic Acids Res..

[24]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[25]  Nuno A. Fonseca,et al.  ArrayExpress update – from bulk to single-cell expression data , 2018, Nucleic Acids Res..

[26]  Nuno A. Fonseca,et al.  Integrative pathway enrichment analysis of multivariate omics data , 2018, bioRxiv.

[27]  Lia S. Campos,et al.  BCL11A interacts with SOX2 to control the expression of epigenetic regulators in lung squamous carcinoma , 2018, Nature Communications.

[28]  Emily E. Burke,et al.  Dissecting transcriptomic signatures of neuronal differentiation and maturation using iPSCs , 2018, bioRxiv.

[29]  Zuyi Weng,et al.  TNF stimulates IL‐6, CXCL8 and VEGF secretion from human keratinocytes via activation of mTOR, inhibited by tetramethoxyluteolin , 2018, Experimental dermatology.

[30]  Zhao Li,et al.  GEOMetaCuration: a web-based application for accurate manual curation of Gene Expression Omnibus metadata , 2018, bioRxiv.

[31]  Giovanna Ambrosini,et al.  MGA repository: a curated data resource for ChIP-seq and other genome annotated data , 2017, Nucleic Acids Res..

[32]  Kathleen M Jagodnik,et al.  Massive mining of publicly available RNA-seq data from human and mouse , 2017, Nature Communications.

[33]  Shannon E. Ellis,et al.  Improving the value of public RNA-seq expression data by phenotype prediction , 2017, bioRxiv.

[34]  AnHai Doan,et al.  MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive , 2017, Bioinform..

[35]  Jeffrey T Leek,et al.  Reproducible RNA-seq analysis using recount2 , 2017, Nature Biotechnology.

[36]  Mary Goldman,et al.  Toil enables reproducible, open source, big biomedical data analyses , 2017, Nature Biotechnology.

[37]  Nuno A. Fonseca,et al.  The RNASeq-er API—a gateway to systematically updated analysis of public RNA-seq data , 2017, Bioinform..

[38]  C. Cabrele,et al.  The Id-protein family in developmental and cancer-associated pathways , 2017, Cell Communication and Signaling.

[39]  Sara Ballouz,et al.  EGAD: Ultra-fast functional analysis of gene networks , 2016, bioRxiv.

[40]  Jun Li,et al.  Comprehensive Characterization of Molecular Differences in Cancer between Male and Female Patients. , 2016, Cancer cell.

[41]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..

[42]  Oleg Mayba,et al.  CCAT1 is an enhancer-templated RNA that predicts BET sensitivity in colorectal cancer. , 2016, The Journal of clinical investigation.

[43]  Nuno A. Fonseca,et al.  Expression Atlas update—an integrated database of gene and protein expression in humans, animals and plants , 2015, Nucleic Acids Res..

[44]  Wei Wang,et al.  Human RNase L tunes gene expression by selectively destabilizing the microRNA-regulated transcriptome , 2015, Proceedings of the National Academy of Sciences.

[45]  D. MacHugh,et al.  High-throughput transcriptomics reveals common and strain-specific responses of human macrophages to infection with Mycobacterium abscessus Smooth and Rough variants , 2015, BMC Genomics.

[46]  Zhongqiang Chen,et al.  Synergistic Gene Expression Signature Observed in TK6 Cells upon Co-Exposure to UVC-Irradiation and Protein Kinase C-Activating Tumor Promoters , 2015, PloS one.

[47]  H. L. Wright,et al.  Whose Gene Is It Anyway? The Effect of Preparation Purity on Neutrophil Transcriptome Studies , 2015, PloS one.

[48]  C. David Page,et al.  Human pluripotent stem cell-derived neural constructs for predicting neural toxicity , 2015, Proceedings of the National Academy of Sciences.

[49]  A. Litonjua,et al.  Vitamin D Modulates Expression of the Airway Smooth Muscle Transcriptome in Fatal Asthma , 2015, PloS one.

[50]  Sourav Bandyopadhyay,et al.  NF-κB-activating complex engaged in response to EGFR oncogene inhibition drives tumor cell survival and residual disease in lung cancer. , 2015, Cell reports.

[51]  W. Kraus,et al.  TNFα signaling exposes latent estrogen receptor binding sites to alter the breast cancer cell transcriptome. , 2015, Molecular cell.

[52]  K. Tomczak,et al.  The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge , 2015, Contemporary oncology.

[53]  Howard Y. Chang,et al.  ATAC‐seq: A Method for Assaying Chromatin Accessibility Genome‐Wide , 2015, Current protocols in molecular biology.

[54]  Laura L. Elo,et al.  Comparison of software packages for detecting differential expression in RNA-seq studies , 2013, Briefings Bioinform..

[55]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[56]  Michael V. Gormally,et al.  Suppression of the FOXM1 transcriptional program via novel small molecule inhibition , 2014, Nature Communications.

[57]  Andrew L. Kung,et al.  NF-κB directs dynamic super enhancer formation in inflammation and atherogenesis. , 2014, Molecular cell.

[58]  Fionn Murtagh,et al.  Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? , 2011, Journal of Classification.

[59]  Manolis Kellis,et al.  The NF-κB genomic landscape in lymphoblastoid B cells. , 2014, Cell reports.

[60]  G. Hattem,et al.  Controlling for Gene Expression Changes in Transcription Factor Protein Networks* , 2014, Molecular & Cellular Proteomics.

[61]  Hervé Abdi,et al.  An ExPosition of multivariate analysis with the singular value decomposition in R , 2014, Comput. Stat. Data Anal..

[62]  D. Srivastava,et al.  The let-7/LIN-41 pathway regulates reprogramming to human induced pluripotent stem cells by controlling expression of prodifferentiation genes. , 2014, Cell stem cell.

[63]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[64]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[65]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[66]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[67]  Susan R. Wilson,et al.  Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing , 2012, BMC Genomics.

[68]  Xiang Wan,et al.  Gemma: a resource for the reuse, sharing and meta-analysis of expression profiling data , 2012, Bioinform..

[69]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[70]  G. Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[71]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[72]  Cedric E. Ginestet ggplot2: Elegant Graphics for Data Analysis , 2011 .

[73]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[74]  G. Crawford,et al.  DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. , 2010, Cold Spring Harbor protocols.

[75]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[76]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[77]  S. Park,et al.  Smad7 binds to the adaptors TAB2 and TAB3 to block recruitment of the kinase TAK1 to the adaptor TRAF2 , 2007, Nature Immunology.

[78]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[79]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[80]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[81]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[82]  G. Kollias,et al.  Tumor necrosis factor‐α regulation of insulin‐like growth factor‐I, type 1 IGF receptor, and IGF binding protein expression in cerebellum of transgenic mice , 2003, Journal of neuroscience research.

[83]  B. Moats-Staats,et al.  Pro- and anti-inflammatory cytokines regulate insulin-like growth factor binding protein production by fetal rat lung fibroblasts. , 2002, American journal of respiratory cell and molecular biology.

[84]  Jo Campling,et al.  Analysis of Variance (ANOVA) , 2002 .

[85]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[86]  J. de Vellis,et al.  Tumor necrosis factor‐α regulation of the Id gene family in astrocytes and microglia during CNS inflammatory injury , 1999, Glia.

[87]  S. Soker,et al.  Tumor Necrosis Factor-α Regulates Expression of Vascular Endothelial Growth Factor Receptor-2 and of Its Co-receptor Neuropilin-1 in Human Vascular Endothelial Cells* , 1998, The Journal of Biological Chemistry.

[88]  G. Neufeld,et al.  Interleukin 6 Induces the Expression of Vascular Endothelial Growth Factor (*) , 1996, The Journal of Biological Chemistry.

[89]  B. Brenner,et al.  Transcriptional regulation of the endothelin-1 gene by TNF-alpha. , 1992, The American journal of physiology.

[90]  Svante Wold,et al.  Analysis of variance (ANOVA) , 1989 .

[91]  J. Cuzick,et al.  A Wilcoxon-type test for trend. , 1985, Statistics in medicine.

[92]  Nitin R. Patel,et al.  A Network Algorithm for Performing Fisher's Exact Test in r × c Contingency Tables , 1983 .