Network-Based Interpretation of Diverse High-Throughput Datasets through the Omics Integrator Software Package

High-throughput, ‘omic’ methods provide sensitive measures of biological responses to perturbations. However, inherent biases in high-throughput assays make it difficult to interpret experiments in which more than one type of data is collected. In this work, we introduce Omics Integrator, a software package that takes a variety of ‘omic’ data as input and identifies putative underlying molecular pathways. The approach applies advanced network optimization algorithms to a network of thousands of molecular interactions to find high-confidence, interpretable subnetworks that best explain the data. These subnetworks connect changes observed in gene expression, protein abundance or other global assays to proteins that may not have been measured in the screens due to inherent bias or noise in measurement. This approach reveals unannotated molecular pathways that would not be detectable by searching pathway databases. Omics Integrator also provides an elegant framework to incorporate not only positive data, but also negative evidence. Incorporating negative evidence allows Omics Integrator to avoid unexpressed genes and avoid being biased toward highly-studied hub proteins, except when they are strongly implicated by the data. The software is comprised of two individual tools, Garnet and Forest, that can be run together or independently to allow a user to perform advanced integration of multiple types of high-throughput data as well as create condition-specific subnetworks of protein interactions that best connect the observed changes in various datasets. It is available at http://fraenkel.mit.edu/omicsintegrator and on GitHub at https://github.com/fraenkel-lab/OmicsIntegrator.

[1]  P. Sharp,et al.  Elucidating MicroRNA Regulatory Networks Using Transcriptional, Post-transcriptional, and Histone Modification Measurements. , 2015, Cell reports.

[2]  Gary D. Bader,et al.  The Biomolecular Interaction Network Database in PSI-MI 2.5 , 2011, Database J. Biol. Databases Curation.

[3]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[4]  Damian Szklarczyk,et al.  STITCH 4: integration of protein–chemical interactions with user data , 2013, Nucleic Acids Res..

[5]  Ernest Fraenkel,et al.  Linking Proteomic and Transcriptional Data through the Interactome and Epigenome Reveals a Map of Oncogene-induced Signaling , 2013, PLoS Comput. Biol..

[6]  Ernest Fraenkel,et al.  SAMNetWeb: identifying condition-specific networks linking signaling and transcription , 2015, Bioinform..

[7]  Alexander E. Kel,et al.  TRANSFAC® and its module TRANSCompel®: transcriptional gene regulation in eukaryotes , 2005, Nucleic Acids Res..

[8]  Z. Bar-Joseph,et al.  Linking the signaling cascades and dynamic regulatory networks controlling stress responses , 2013, Genome research.

[9]  David B. Berry,et al.  Pathway connectivity and signaling coordination in the yeast stress-activated signaling network , 2014, Molecular systems biology.

[10]  Israel Steinfeld,et al.  BMC Bioinformatics BioMed Central , 2008 .

[11]  Kenta Nakai,et al.  Linking Transcriptional Changes over Time in Stimulated Dendritic Cells to Identify Gene Networks Activated during the Innate Immune Response , 2013, PLoS Comput. Biol..

[12]  Christian Borgs,et al.  Finding undetected protein associations in cell signaling by belief propagation , 2010, Proceedings of the National Academy of Sciences.

[13]  Harmen J. Bussemaker,et al.  REDUCE: an online tool for inferring cis-regulatory elements and transcriptional module activities from microarray data , 2003, Nucleic Acids Res..

[14]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[15]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[16]  T. Furey ChIP – seq and beyond : new and improved methodologies to detect and characterize protein – DNA interactions , 2012 .

[17]  M. Scott,et al.  The output of Hedgehog signaling is controlled by the dynamic association between Suppressor of Fused and the Gli proteins. , 2010, Genes & development.

[18]  David S. Wishart,et al.  HMDB 3.0—The Human Metabolome Database in 2013 , 2012, Nucleic Acids Res..

[19]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[20]  Jacob F. Degner,et al.  Sequence and Chromatin Accessibility Data Accurate Inference of Transcription Factor Binding from Dna Material Supplemental Open Access , 2022 .

[21]  Ian M. Donaldson,et al.  iRefIndex: A consolidated protein interaction database with provenance , 2008, BMC Bioinformatics.

[22]  Helga Thorvaldsdóttir,et al.  Molecular signatures database (MSigDB) 3.0 , 2011, Bioinform..

[23]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2009 update , 2009, Nucleic Acids Res..

[24]  R. Russell,et al.  Potential artefacts in protein‐interaction networks , 2002, FEBS letters.

[25]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  L. Delacroix,et al.  Hedgehog signaling pathway is inactive in colorectal cancer cell lines , 2007, International journal of cancer.

[28]  Alexander Junge,et al.  KeyPathwayMiner 4.0: condition-specific pathway analysis by combining multiple omics studies and networks with Cytoscape , 2014, BMC Systems Biology.

[29]  Ralf Herwig,et al.  ConsensusPathDB: toward a more complete picture of cell biology , 2010, Nucleic Acids Res..

[30]  Shane J. Neph,et al.  An expansive human regulatory lexicon encoded in transcription factor footprints , 2012, Nature.

[31]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Stuart Thomson,et al.  A systems view of epithelial–mesenchymal transition signaling states , 2010, Clinical & Experimental Metastasis.

[33]  David E. Housman,et al.  Extensive changes in DNA methylation are associated with expression of mutant huntingtin , 2013, Proceedings of the National Academy of Sciences.

[34]  Forest M White,et al.  Quantitative analysis of EGFRvIII cellular signaling networks reveals a combinatorial therapeutic strategy for glioblastoma , 2007, Proceedings of the National Academy of Sciences.

[35]  Ernest Fraenkel,et al.  SAMNet: a network-based approach to integrate multi-dimensional high throughput datasets. , 2012, Integrative biology : quantitative biosciences from nano to macro.

[36]  Christian Borgs,et al.  Simultaneous Reconstruction of Multiple Signaling Pathways via the Prize-Collecting Steiner Forest Problem , 2012, J. Comput. Biol..

[37]  Roded Sharan,et al.  BMC Bioinformatics BioMed Central , 2006 .

[38]  Christian Borgs,et al.  Sharing Information to Reconstruct Patient-Specific Pathways in Heterogeneous Diseases , 2013, Pacific Symposium on Biocomputing.

[39]  Shirin Bonni,et al.  Suppression of TGFβ-Induced Epithelial-Mesenchymal Transition Like Phenotype by a PIAS1 Regulated Sumoylation Pathway in NMuMG Epithelial Cells , 2010, PloS one.

[40]  J. Kelber,et al.  Identification of a PEAK1/ZEB1 signaling axis during TGFβ/fibronectin-induced EMT in breast cancer. , 2015, Biochemical and biophysical research communications.

[41]  E. Fraenkel,et al.  Integrating Proteomic, Transcriptional, and Interactome Data Reveals Hidden Components of Signaling and Regulatory Networks , 2009, Science Signaling.

[42]  Mark D. Biggin,et al.  Statistics requantitates the central dogma , 2015, Science.

[43]  S. Lovell,et al.  Protein-protein interaction networks and biology—what's the connection? , 2008, Nature Biotechnology.

[44]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[45]  Roded Sharan,et al.  SPINE: a framework for signaling-regulatory pathway inference from cause-effect experiments , 2007, ISMB/ECCB.

[46]  David Haussler,et al.  Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE) , 2013, Bioinform..

[47]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Anupam Gupta,et al.  Discovering pathways by orienting edges in protein interaction networks , 2010, Nucleic acids research.

[49]  Eli Upfal,et al.  Algorithms for Detecting Significantly Mutated Pathways in Cancer , 2010, RECOMB.

[50]  Ziv Bar-Joseph,et al.  Identifying proteins controlling key disease signaling pathways , 2013, Bioinform..

[51]  Ernest Fraenkel,et al.  ResponseNet: revealing signaling and regulatory networks linking genetic and transcriptomic screening data , 2011, Nucleic Acids Res..

[52]  Kevin Y. Yip,et al.  A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets , 2011, Genome Biology.

[53]  Swapna Asuthkar,et al.  EGFR and c-Met Cross Talk in Glioblastoma and Its Regulation by Human Cord Blood Stem Cells. , 2012, Translational oncology.

[54]  E. Marcotte,et al.  Insights into the regulation of protein abundance from proteomic and transcriptomic analyses , 2012, Nature Reviews Genetics.

[55]  D. Karger,et al.  Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity , 2009, Nature Genetics.

[56]  M. Selbach,et al.  Global quantification of mammalian gene expression control , 2011, Nature.

[57]  Tatsunori B. Hashimoto,et al.  Discovery of non-directional and directional pioneer transcription factors by modeling DNase profile magnitude and shape , 2014, Nature Biotechnology.

[58]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[59]  Ron Shamir,et al.  Identifying functional modules using expression profiles and confidence-scored protein interactions , 2009, Bioinform..

[60]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[61]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[62]  Nathan C. Sheffield,et al.  Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. , 2011, Genome research.