signatureSearch: environment for gene expression signature searching and functional interpretation

Abstract signatureSearch is an R/Bioconductor package that integrates a suite of existing and novel algorithms into an analysis environment for gene expression signature (GES) searching combined with functional enrichment analysis (FEA) and visualization methods to facilitate the interpretation of the search results. In a typical GES search (GESS), a query GES is searched against a database of GESs obtained from large numbers of measurements, such as different genetic backgrounds, disease states and drug perturbations. Database matches sharing correlated signatures with the query indicate related cellular responses frequently governed by connected mechanisms, such as drugs mimicking the expression responses of a disease. To identify which processes are predominantly modulated in the GESS results, we developed specialized FEA methods combined with drug-target network visualization tools. The provided analysis tools are useful for studying the effects of genetic, chemical and environmental perturbations on biological systems, as well as searching single cell GES databases to identify novel network connections or cell types. The signatureSearch software is unique in that it provides access to an integrated environment for GESS/FEA routines that includes several novel search and enrichment methods, efficient data structures, and access to pre-built GES databases, and allowing users to work with custom databases.

[1]  Thomas Girke,et al.  bioassayR: Cross-Target Analysis of Small Molecule Bioactivity , 2016, J. Chem. Inf. Model..

[2]  Yan Wang,et al.  fmcsR: mismatch tolerant maximum common substructure searching in R , 2013, Bioinform..

[3]  J. Bond,et al.  PCI-24781 (abexinostat), a novel histone deacetylase inhibitor, induces reactive oxygen species-dependent apoptosis and is synergistic with bortezomib in neuroblastoma. , 2013, Journal of cancer therapeutics & research.

[4]  P. Khatri,et al.  Global functional profiling of gene expression. , 2003, Genomics.

[5]  G. Upton Fisher's Exact Test , 1992 .

[6]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[7]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[8]  Damian Szklarczyk,et al.  STITCH 2: an interaction network database for small molecules and proteins , 2009, Nucleic Acids Res..

[9]  Konstantina S. Nikita,et al.  Bioinformatics methods in drug repurposing for Alzheimer's disease , 2016, Briefings Bioinform..

[10]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[11]  T. Golub,et al.  A method for high-throughput gene expression signature analysis , 2006, Genome Biology.

[12]  R. Aebersold,et al.  A Proteomic Connectivity Map. , 2018, Cell systems.

[13]  Bernd Bischl,et al.  BatchJobs and BatchExperiments: Abstraction Mechanisms for Using R in Batch Environments , 2015 .

[14]  Jihye Kim,et al.  DSigDB: drug signatures database for gene set analysis , 2015, Bioinform..

[15]  H. Ji,et al.  A network-based gene-weighting approach for pathway analysis , 2011, Cell Research.

[16]  M. Fielden,et al.  Development of a large-scale chemogenomics database to improve drug candidate selection and to understand mechanisms of chemical toxicity and action. , 2005, Journal of biotechnology.

[17]  Angela N. Brooks,et al.  A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles , 2017, Cell.

[18]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[19]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[20]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[21]  Vasileios Stathias,et al.  LINCS Data Portal 2.0: next generation access point for perturbation-response signatures , 2019, Nucleic Acids Res..

[22]  Thomas Girke,et al.  systemPipeR: NGS workflow and report generation environment , 2016, BMC Bioinformatics.

[23]  Rajiv Narayan,et al.  The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices , 2018, Bioinform..

[24]  C. Chuong,et al.  Disrupted ectodermal organ morphogenesis in mice with a conditional histone deacetylase 1, 2 deletion in the epidermis , 2013, The Journal of investigative dermatology.

[25]  M. Reinders,et al.  A comparison of automatic cell identification methods for single-cell RNA sequencing data , 2019, Genome Biology.

[26]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[27]  Crispin J. Miller,et al.  Cell Culture , 2010, Cell.

[28]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[29]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Mario Lauria,et al.  Rank-based transcriptional signatures , 2013 .

[31]  Robert Gentleman,et al.  gCMAP: user-friendly connectivity mapping with R , 2014, Bioinform..

[32]  Tao Jiang,et al.  ChemmineR: a compound mining framework for R , 2008, Bioinform..

[33]  Jacob K. Asiedu,et al.  The Drug Repurposing Hub: a next-generation drug library and information resource , 2017, Nature Medicine.

[34]  J. Warrington,et al.  The affymetrix GeneChip platform: an overview. , 2006, Methods in enzymology.

[35]  Mithat Gonen,et al.  Building a Nomogram for Survey-Weighted Cox Models Using R , 2015 .

[36]  Yao-Yu Hsieh,et al.  Systematic polypharmacology and drug repurposing via an integrated L1000-based Connectivity Map database mining , 2018, Royal Society Open Science.

[37]  J. Wheler,et al.  PCI-24781, a Novel Hydroxamic Acid HDAC Inhibitor, Exerts Cytotoxicity and Histone Alterations via Caspase-8 and FADD in Leukemia Cells , 2010, International journal of cell biology.

[38]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[39]  A. Liberzon,et al.  GSKB: A gene set database for pathway analysis in mouse , 2016, bioRxiv.

[40]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[41]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[42]  William T. Barry,et al.  SIGNATURE: A workbench for gene expression signature analysis , 2011, BMC Bioinformatics.

[43]  Nils Blüthgen,et al.  Classification of gene signatures for their information value and functional redundancy , 2017, npj Systems Biology and Applications.

[44]  Vasileios Stathias,et al.  Connecting omics signatures of diseases, drugs, and mechanisms of actions with iLINCS , 2019, bioRxiv.

[45]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[46]  Luis A. Aguilar,et al.  PulmonDB: a curated lung disease gene expression database , 2019, Scientific Reports.

[47]  Aedín C. Culhane,et al.  GeneSigDB: a manually curated database and resource for analysis of gene expression signatures , 2011, Nucleic Acids Res..

[48]  Marc Hafner,et al.  L1000CDS2: LINCS L1000 characteristic direction signatures search engine , 2016, npj Systems Biology and Applications.

[49]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .