RelExplain—integrating data and networks to explain biological processes

Motivation: The goal of many genome‐wide experiments is to explain the changes between the analyzed conditions. Typically, the analysis is started with a set of differential genes DG and the first step is to identify the set of relevant biological processes BP. Current enrichment methods identify the involved biological process via statistically significant overrepresentation of differential genes in predefined sets, but do not further explain how the differential genes interact with each other or which other genes might be important for the enriched process. Other network‐based methods determine subnetworks of interacting genes containing many differential genes, but do not employ process knowledge for a more focused analysis. Results: RelExplain is a method to analyze a given biological process bp (e.g. identified by enrichment) in more detail by computing an explanation using the measured DG and a given network. An explanation is a subnetwork that contains the differential genes in the process bp and connects them in the best way given the experimental data using also genes that are not differential or not in bp. RelExplain takes into account the functional annotations of nodes and the edge consistency of the measurements. Explanations are compact networks of the relevant part of the bp and additional nodes that might be important for the bp. Our evaluation showed that RelExplain is better suited to retrieve manually curated subnetworks from unspecific networks than other algorithms. The interactive RelExplain tool allows to compute and inspect sub‐optimal and alternative optimal explanations. Availability and Implementation: A webserver is available at https://services.bio.ifi.lmu.de/relexplain. Contact : berchtold@bio.ifi.lmu.de Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[2]  Simon Dirmeier,et al.  A comprehensive gene regulatory network for the diauxic shift in Saccharomyces cerevisiae , 2013, Nucleic acids research.

[3]  Jill P. Mesirov,et al.  GSEA-P: a desktop application for Gene Set Enrichment Analysis , 2007, Bioinform..

[4]  Benjamin J. Raphael,et al.  Pan-Cancer Network Analysis Identifies Combinations of Rare Somatic Mutations across Pathways and Protein Complexes , 2014, Nature Genetics.

[5]  P. Thomas,et al.  Minireview: G protein-coupled estrogen receptor-1, GPER-1: its mechanism of action and role in female reproductive cancer, renal and vascular physiology. , 2012, Endocrinology.

[6]  Ralf Zimmer,et al.  Count ratio model reveals bias affecting NGS fold changes , 2015, Nucleic acids research.

[7]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[8]  Purvesh Khatri,et al.  A comparison of existing tools for ontological analysis of gene expression data , 2005 .

[9]  Cristina Mitrea,et al.  Methods and approaches in the topology-based analysis of biological pathways , 2013, Front. Physiol..

[10]  Catarina Costa,et al.  The YEASTRACT database: an upgraded information system for the analysis of gene and genomic transcription regulation in Saccharomyces cerevisiae , 2013, Nucleic Acids Res..

[11]  Atul J. Butte,et al.  Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges , 2012, PLoS Comput. Biol..

[12]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[13]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[14]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[15]  E. Fraenkel,et al.  Integrating Proteomic, Transcriptional, and Interactome Data Reveals Hidden Components of Signaling and Regulatory Networks , 2009, Science Signaling.

[16]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[17]  T. Ideker,et al.  Integrative approaches for finding modular structure in biological networks , 2013, Nature Reviews Genetics.

[18]  Nicola J. Mulder,et al.  From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems , 2011, Bioinform..