ADAGE signature analysis: differential expression analysis with data-defined gene sets

BackgroundGene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data.ResultsHere we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server (http://adage.greenelab.com) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and ∆anr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr.ConclusionsWe designed ADAGE signature analysis to perform gene set analysis using data-defined functional gene signatures. This approach addresses an important gap for biologists studying non-traditional model organisms and those without extensive curated resources available. We built both an R package and web server to provide ADAGE signature analysis to the community.

[1]  Y. Igarashi,et al.  Expression of the nir and nor genes for denitrification of Pseudomonas aeruginosa requires a novel CRP/FNR‐related transcriptional regulator, DNR, in addition to ANR , 1995, FEBS letters.

[2]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[3]  Yihui Xie,et al.  Create Interactive Web Maps with the JavaScript 'Leaflet'Library , 2015 .

[4]  Marylyn D. Ritchie,et al.  Tribe: The collaborative platform for reproducible web-based analysis of gene sets , 2016, bioRxiv.

[5]  Maria Keays,et al.  ArrayExpress update—trends in database growth and links to data analysis tools , 2012, Nucleic Acids Res..

[6]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[7]  Jason H. Moore,et al.  Pathway analysis of genomic data: concepts, methods, and prospects for future development. , 2012, Trends in genetics : TIG.

[8]  Fergal O'Gara,et al.  MexT Functions as a Redox-Responsive Regulator Modulating Disulfide Stress Resistance in Pseudomonas aeruginosa , 2012, Journal of bacteriology.

[9]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[10]  C. Reimmann,et al.  Anaerobic growth and cyanide synthesis of Pseudomonas aeruginosa depend on anr, a regulatory gene homologous with fnr of Escherichia coli , 1991, Molecular microbiology.

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Raymond Lo,et al.  Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database , 2015, Nucleic Acids Res..

[13]  T. Lumley,et al.  gplots: Various R Programming Tools for Plotting Data , 2015 .

[14]  Perry L. Miller,et al.  SENEX: a computer-based representation of cellular signal transduction processes in the central nervous system , 1991, Comput. Appl. Biosci..

[15]  Jeroen Ooms,et al.  The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects , 2014, ArXiv.

[16]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[18]  Hadley Wickham,et al.  Tools to Make Developing R Packages Easier , 2016 .

[19]  Siqi Wu,et al.  Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks , 2016, Proceedings of the National Academy of Sciences.

[20]  Michael R. Kosorok,et al.  Identification of differential gene pathways with principal component analysis , 2009, Bioinform..

[21]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[22]  Eric Déziel,et al.  MexEF-OprN Efflux Pump Exports the Pseudomonas Quinolone Signal (PQS) Precursor HHQ (4-hydroxy-2-heptylquinoline) , 2011, PloS one.

[23]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[24]  Jie Tan,et al.  Unsupervised Extraction of Stable Expression Signatures from Public Compendia with an Ensemble of Neural Networks. , 2017, Cell systems.

[25]  Deborah A. Hogan,et al.  Links between Anr and Quorum Sensing in Pseudomonas aeruginosa Biofilms , 2015, Journal of bacteriology.

[26]  D. Daigle,et al.  mexEF-oprN Multidrug Efflux Operon of Pseudomonas aeruginosa: Regulation by the MexT Activator in Response to Nitrosative Stress and Chloramphenicol , 2010, Antimicrobial Agents and Chemotherapy.

[27]  Scott Chamberlain,et al.  Create Interactive Web Graphics via Plotly's JavaScript GraphingLibrary , 2015 .

[28]  Jie Tan,et al.  Cross-platform normalization of microarray and RNA-seq data for machine learning applications , 2016, PeerJ.

[29]  Taiji Nakae,et al.  Transcriptional regulation of the mexEF-oprN multidrug efflux pump operon by MexT and an unidentified repressor in nfxC-type mutant of Pseudomonas aeruginosa. , 2010, FEMS microbiology letters.

[30]  Roberto Romero,et al.  A Comparison of Gene Set Analysis Methods in Terms of Sensitivity, Prioritization and Specificity , 2013, PloS one.

[31]  L. Pierson,et al.  Transcriptome profiling reveals links between ParS/ParR, MexEF-OprN, and quorum sensing in the regulation of adaptation and virulence in Pseudomonas aeruginosa , 2013, BMC Genomics.

[32]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[33]  D. Damian,et al.  Statistical concerns about the GSEA procedure , 2004, Nature Genetics.

[34]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[35]  David Lindgren,et al.  Independent component analysis reveals new and biologically significant structures in micro array data , 2006, BMC Bioinformatics.

[36]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[37]  Michael F. Ochs,et al.  PatternMarkers & GWCoGAPS for novel data-driven biomarkers via whole transcriptome NMF , 2016, bioRxiv.

[38]  Dieter Jahn,et al.  Anaerobic adaptation in Pseudomonas aeruginosa: definition of the Anr and Dnr regulons. , 2009, Environmental microbiology.

[39]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[40]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[41]  E. W. Lang,et al.  Analyzing time-dependent microarray data using independent component analysis derived expression modes from human macrophages infected with F. tularensis holartica , 2009, J. Biomed. Informatics.

[42]  Casey S. Greene,et al.  Computational Approaches to Study Microbes and Microbiomes , 2016, PSB.

[43]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[44]  C. Greene,et al.  ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions , 2016, mSystems.

[45]  Ivan Erill,et al.  CollecTF: a database of experimentally validated transcription factor-binding sites in Bacteria , 2013, Nucleic Acids Res..

[46]  Hadley Wickham,et al.  Tools for Working with URLs and HTTP , 2016 .

[47]  Barbara J. Wold,et al.  Mining gene expression data by interpreting principal components , 2006, BMC Bioinformatics.

[48]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[49]  Robert Clarke,et al.  Knowledge-guided multi-scale independent component analysis for biomarker identification , 2008, BMC Bioinformatics.

[50]  J M Tiedje,et al.  Anaerobic activation of the entire denitrification pathway in Pseudomonas aeruginosa requires Anr, an analog of Fnr , 1995, Journal of bacteriology.

[51]  Patricia C. Babbitt,et al.  Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space , 2013, PLoS Comput. Biol..

[52]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[53]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[54]  Hadley Wickham,et al.  Reshaping Data with the reshape Package , 2007 .

[55]  F. O'Gara,et al.  Transcriptome profiling defines a novel regulon modulated by the LysR-type transcriptional regulator MexT in Pseudomonas aeruginosa , 2009, Nucleic acids research.

[56]  Benoit Thieurmel,et al.  Network Visualization using 'vis.js' Library , 2015 .