Evaluating Transcription Factor Activity Changes by Scoring Unexplained Target Genes in Expression Data

Several methods predict activity changes of transcription factors (TFs) from a given regulatory network and measured expression data. But available gene regulatory networks are incomplete and contain many condition-dependent regulations that are not relevant for the specific expression measurement. It is not known which combination of active TFs is needed to cause a change in the expression of a target gene. A method to systematically evaluate the inferred activity changes is missing. We present such an evaluation strategy that indicates for how many target genes the observed expression changes can be explained by a given set of active TFs. To overcome the problem that the exact combination of active TFs needed to activate a gene is typically not known, we assume a gene to be explained if there exists any combination for which the predicted active TFs can possibly explain the observed change of the gene. We introduce the i-score (inconsistency score), which quantifies how many genes could not be explained by the set of activity changes of TFs. We observe that, even for these minimal requirements, published methods yield many unexplained target genes, i.e. large i-scores. This holds for all methods and all expression datasets we evaluated. We provide new optimization methods to calculate the best possible (minimal) i-score given the network and measured expression data. The evaluation of this optimized i-score on a large data compendium yields many unexplained target genes for almost every case. This indicates that currently available regulatory networks are still far from being complete. Both the presented Act-SAT and Act-A* methods produce optimal sets of TF activity changes, which can be used to investigate the difficult interplay of expression and network data. A web server and a command line tool to calculate our i-score and to find the active TFs associated with the minimal i-score is available from https://services.bio.ifi.lmu.de/i-score.

[1]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[2]  Kaile Su,et al.  Tailoring Local Search for Partial MaxSAT , 2014, AAAI.

[3]  Mariano J. Alvarez,et al.  A human B-cell interactome identifies MYB and FOXM1 as master regulators of proliferation in germinal centers , 2010, Molecular systems biology.

[4]  I. Simon,et al.  Reconstructing dynamic regulatory maps , 2007, Molecular systems biology.

[5]  D. Shore,et al.  Growth-regulated recruitment of the essential yeast ribosomal protein gene activator Ifh1 , 2004, Nature.

[6]  Adrian Kügel,et al.  Improved Exact Solver for the Weighted MAX-SAT Problem , 2010, POS@SAT.

[7]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[8]  Wei Chen,et al.  Transcription factor activity estimation based on particle swarm optimization and fast network component analysis , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[9]  Merja Penttilä,et al.  Transcriptional responses of Saccharomyces cerevisiae to shift from respiratory and respirofermentative to fully fermentative metabolism. , 2011, Omics : a journal of integrative biology.

[10]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[11]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[12]  Ralf Zimmer,et al.  Cross-species Conservation of context-specific networks , 2016, BMC Systems Biology.

[13]  Catarina Costa,et al.  The YEASTRACT database: an upgraded information system for the analysis of gene and genomic transcription regulation in Saccharomyces cerevisiae , 2013, Nucleic Acids Res..

[14]  Daniel J. Vis,et al.  T-profiler: scoring the activity of predefined groups of genes using gene expression data , 2005, Nucleic Acids Res..

[15]  R. Küffner,et al.  Petri Nets with Fuzzy Logic (PNFL): Reverse Engineering and Parametrization , 2010, PloS one.

[16]  Piotr J. Balwierz,et al.  ISMARA: automated modeling of genomic signals as a democracy of regulatory motifs , 2014, Genome research.

[17]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[18]  A. Boulesteix,et al.  Predicting transcription factor activities from combined analysis of microarray and ChIP data: a partial least squares approach , 2005, Theoretical Biology and Medical Modelling.

[19]  Ziv Bar-Joseph,et al.  DREM 2.0: Improved reconstruction of dynamic regulatory networks from time-series expression data , 2012, BMC Systems Biology.

[20]  Marcus Oswald,et al.  Estimating the activity of transcription factors by the effect on their target genes , 2014, Bioinform..

[21]  Margaret Werner-Washburne,et al.  The genomics of yeast responses to environmental stress and starvation , 2002, Functional & Integrative Genomics.

[22]  Hsien-Da Huang,et al.  miRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions , 2013, Nucleic Acids Res..

[23]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[24]  P. Mendes,et al.  The Genome-Wide Early Temporal Response of Saccharomyces cerevisiae to Oxidative Stress Induced by Cumene Hydroperoxide , 2013, PloS one.

[25]  Ralf Zimmer,et al.  Count ratio model reveals bias affecting NGS fold changes , 2015, Nucleic acids research.

[26]  Daniel S. Himmelstein,et al.  Understanding multicellular function and disease with human tissue-specific networks , 2015, Nature Genetics.

[27]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .