We introduce Pathicular http://bioinformatics.psb.ugent.be/software/details/Pathicular, a Cytoscape plugin for studying the cellular response to perturbations of transcription factors by integrating perturbational expression data with transcriptional, protein-protein and phosphorylation networks. Pathicular searches for 'regulatory path motifs', short paths in the integrated physical networks which occur significantly more often than expected between transcription factors and their targets in the perturbational data. A case study in Saccharomyces cerevisiae identifies eight regulatory path motifs and demonstrates their biological significance. Rationale When a cell is perturbed by external stimuli, it responds by adjusting the amount at which different types of proteins are needed. Transcriptional regulatory networks form the core of this cellular response system. However, the static wiring of these networks does not reveal which parts of the network are active under certain conditions and how perturbations are propagated through the network. For this reason there has been much interest in integrating the static network topology with gene expression data which reflect the dynamical or functional state of the network. In a pioneering paper, large changes were identified in the subnetworks of the transcriptional regulatory network of S. cerevisiae active under five different conditions [1]. In reality, the transcriptional regulatory network cannot be considered in isolation, but it is integrated with other networks such as the protein-protein interaction network [2]. In [3], a framework was developed which integrates protein-protein and protein-DNA interactions to identify active subnetworks of physical interactions in perturbational data. These subnetworks extend traditional clustering approaches by grouping genes consistent with the constraints of the physical interaction networks. In [4], a further step was taken by introducing a probabilistic model to link a causative gene, via paths in the protein-DNA and protein-protein interaction network, to the set of effect genes which are differentially expressed upon knockout of the causative gene, without requiring that the intermediate genes be differentially expressed as well. This approach was used to map DNAdamage response pathways [5] and jointly model regulatory and metabolic networks [6]. The problem to explain knockout pairs using physical interactions continues to attract much interest. In [7], an integer programming formulation was introduced and in [8] an approach based on representing the physical networks by electrical wiring diagrams was applied to the study of expression quantitative trait loci. In [9], a similar approach was used to connect genetic hits to differentially expressed genes using an integrated network containing protein-protein, protein-DNA and metabolic interactions, and in [10] a technique based on the Steiner tree problem was presented. All of these techniques have in common that they are computationally expensive and try to explain as many knockout or cause-effect pairs as possible in a particular set of experiments, but do not search for general mechanisms or path structures which are common between different (classes of) knocked-out genes. A much simpler method was used in [11]. There all paths of length two in an integrated protein-protein and proteinDNA interaction network connecting a transcription factor to its knockout gene set were kept to study the effect of redundancy between paralogous transcription factors in perturbational data. The optimal path length was determined by a hypergeometric test between the knockout set and the set of genes reached by paths of a given length [11]. In this paper we present an alternative strategy for elucidating response-to-perturbation mechanisms in integrated networks which is based on the notion of a path-like network motif. Standard network motifs are small subgraphs * Correspondence: tom.michoel@psb.vib-ugent.be 1 Department of Plant Systems Biology, VIB, Technologiepark 927, B-9052 Gent, Belgium Full list of author information is available at the end of the article Joshi et al. Genome Biology 2010, 11:R32 http://genomebiology.com/2010/11/3/R32 Page 2 of 14 which occur significantly more often in a network than expected by chance and characterize its static properties [12,13], forming functional modules in integrated networks [14]. Recently, it has been shown that by overlaying functional data over static network structures additional types of network motifs can be discovered [15]. The kind of motifs studied in [15] are so-called activity motifs, short patterns of timed gene expression regulation events occurring significantly more often than expected by chance in the metabolic network of S. cerevisiae. In the same spirit, we define regulatory path motifs as short, significantly enriched paths in integrated physical networks which connect a causative gene (for example, a transcription factor) to a set of effect genes which are differentially expressed after perturbation of the causative gene. Enrichment of a regulatory path indicates that it connects significantly more true cause-effect pairs than suitably randomized cause-effect pairs. Our method is implemented as a Cytoscape [16] plug-in 'Pathicular' to identify regulatory path motifs in integrated networks. As a case study, we used comprehensive microarray data sets for 157 transcription factor deletion experiments [17] and 55 transcription factor overexpression experiments [18] in S. cerevisiae, together with large-scale networks of transcriptional regulatory interactions [19,20], protein-protein interactions [21] and phosphorylation interactions [22]. Our algorithm identified eight regulatory path motifs, of which five were enriched in both deletion and overexpression data. These eight motifs explain 13% of all genes differentially expressed in deletion data and 24% in overexpression data, a more than fiveto ten-fold increase compared to using direct transcriptional links only, confirming that perturbational microarray experiments contain mostly indirect regulatory links. We further observed that regulatory path motifs are organized into modules of genes connected to a transcription factor by the same path and the same intermediate nodes. Perturbed targets forming such modules tend to be highly coexpressed and functionally coherent and we have used this property for predicting periodic genes and associating novel functions to genes. Finally, we considered two condition-dependent data sets, one containing deletion experiments for 27 transcription factors under DNA-damage condition [5], and one cell cycle specific data set by selecting only the cell cycle regulators from [17], and compared the relative abundance of each path motif between those data sets. The current version of Pathicular supports functions to calculate regulatory path significance values for userdefined cause-effect and directed or undirected physical interaction networks, to visualize regulatory paths on the integrated interaction network, and to extract and visualize regulatory path modules. Pathicular is freely available for academic use. Results Direct transcriptional links in perturbational data Perturbational expression data can be viewed as a network where each transcription factor is connected to the genes that are differentially expressed after deletion or overexpression of the transcription factor. In [23], the topological properties of the deletion and overexpression network were compared with a transcriptional network of genome wide ChIP-chip interactions (TRI(C)), assuming that the deletion and overexpression network also consist of direct interactions. We added a fourth transcriptional network to the comparison predicted using cis-regulatory elements (TRI(M)). These four networks contain targets for 23 common transcription factors, but they do not share even a single transcription factor-target pair, although the overlap between each pair of networks is statistically significant (Figure 1). There is much higher overlap between TRI(C) and TRI(M) compared to all other pairwise combinations. On the other hand, the overexpression and deletion networks share only about 2% of their interactions with TRI(C) and TRI(M). This indicates that the deletion and overexpression networks do not contain a large fraction of direct targets. We further calculated the overlap between each of these networks for each transcription factor individually (Table S1 in Additional File 1). Consistent with the global analysis, 18 transcription factors of 23 have significant overlap between TRI(C) and TRI(M). There is a relatively small overlap of 12 transcription factors between the deletion and overexpression network, but it is known that the deletion and overexpression phenotypes are quite different for most genes [24]. Only seven transcription factors (INO2, GCN4, SWI4, SKN7, HAP4, YAP1 and SOK2) in the overexpression network, and four (SIP4, PUT3, RFX1, MSN2) in the deletion network, share significant targets with TRI(C), without any overlap between these two sets. The seven overexpression transcription factors mainly act in response to certain conditions, for instance INO2 is activated in response to inocitol depletion and YAP1 is activated in H2O2 stress. It has been argued that overexpressing a transcription factor mimics the condition of transcription factor activation in response to a stimulus [18]. We also observed that five of these seven transcription factors (INO2, GCN4, SWI4, HAP4 and YAP1) show significant pairwise coexpression with their targets. This suggests that the overexpression method is better suited for direct target prediction of transcription factors which are activated in response to a particular signal. Similar results are obtained by comparing the overexpression and deletion networks to TRI(M). Indirect regulatory paths in perturbational data When a tran
[1]
Yves Van de Peer,et al.
Characterizing regulatory path motifs in integrated networks using perturbational data
,
2010,
Genome Biology.
[2]
E. Fraenkel,et al.
Integrating Proteomic, Transcriptional, and Interactome Data Reveals Hidden Components of Signaling and Regulatory Networks
,
2009,
Science Signaling.
[3]
I. Simon,et al.
Backup in gene regulatory networks explains differences between binding and knockout results
,
2009,
Molecular systems biology.
[4]
D. Karger,et al.
Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity
,
2009,
Nature Genetics.
[5]
D. Koller,et al.
Activity motifs reveal principles of timing in transcriptional control of the yeast metabolic network
,
2008,
Nature Biotechnology.
[6]
L. Aravind,et al.
Comparison of transcription regulatory interactions inferred from high-throughput methods: what do they reveal?
,
2008,
Trends in genetics : TIG.
[7]
Peter Kaiser,et al.
A Dominant Suppressor Mutation of the met30 Cell Cycle Defect Suggests Regulation of the Saccharomyces cerevisiae Met4-Cbf1 Transcription Complex by Met32*
,
2008,
Journal of Biological Chemistry.
[8]
Yonina C. Eldar,et al.
eQED: an efficient method for interpreting eQTL associations using protein networks
,
2008,
Molecular systems biology.
[9]
Andrzej Kudlicki,et al.
High-resolution timing of cell cycle-regulated gene expression
,
2007,
Proceedings of the National Academy of Sciences.
[10]
Roded Sharan,et al.
SPINE: a framework for signaling-regulatory pathway inference from cause-effect experiments
,
2007,
ISMB/ECCB.
[11]
Pietro Liò,et al.
Bottleneck Genes and Community Structure in the Cell Cycle Network of S. pombe
,
2007,
PLoS Comput. Biol..
[12]
Patrick J. Killion,et al.
Genetic reconstruction of a functional transcriptional regulatory network
,
2007,
Nature Genetics.
[13]
M. Gerstein,et al.
Getting connected: analysis and principles of biological networks.
,
2007,
Genes & development.
[14]
J. Collins,et al.
Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles
,
2007,
PLoS biology.
[15]
Erik van Nimwegen,et al.
SwissRegulon: a database of genome-wide annotations of regulatory sites
,
2006,
Nucleic Acids Res..
[16]
Lin Tang,et al.
Inferring direct regulatory targets from expression and genome location analyses: a comparison of transcription factor deletion and overexpression
,
2006,
BMC Genomics.
[17]
William Stafford Noble,et al.
The Forkhead transcription factor Hcm1 regulates chromosome segregation genes and fills the S-phase gap in the transcriptional circuitry of the cell cycle.
,
2006,
Genes & development.
[18]
Charles Boone,et al.
Identifying transcription factor functions and targets by phenotypic activation
,
2006,
Proceedings of the National Academy of Sciences.
[19]
Martin Vingron,et al.
A joint model of regulatory and metabolic networks
,
2006,
BMC Bioinformatics.
[20]
T. Ideker,et al.
Comprehensive curation and analysis of global interaction networks in Saccharomyces cerevisiae
,
2006,
Journal of biology.
[21]
Trey Ideker,et al.
Integrated Assessment and Prediction of Transcription Factor Binding
,
2006,
PLoS Comput. Biol..
[22]
T. Hughes,et al.
Mapping pathways and phenotypes by systematic gene overexpression.
,
2006,
Molecular cell.
[23]
M. Gerstein,et al.
Global analysis of protein phosphorylation in yeast
,
2005,
Nature.
[24]
S. Brunak,et al.
New weakly expressed cell cycle‐regulated genes in yeast
,
2005,
Yeast.
[25]
T. Jaakkola,et al.
Validation and refinement of gene-regulatory pathways on a network of physical interactions
,
2005,
Genome Biology.
[26]
S. L. Wong,et al.
Motifs, themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network
,
2005,
Journal of biology.
[27]
Z. Darieva,et al.
Regulation of Cell Cycle-Specific Gene Expression through Cyclin-Dependent Kinase-Mediated Phosphorylation of the Forkhead Transcription Factor Fkh2p
,
2004,
Molecular and Cellular Biology.
[28]
M. Gerstein,et al.
Genomic analysis of regulatory network dynamics reveals large topological changes
,
2004,
Nature.
[29]
Nicola J. Rinaldi,et al.
Transcriptional regulatory code of a eukaryotic genome
,
2004,
Nature.
[30]
Tommi S. Jaakkola,et al.
Physical Network Models
,
2004,
J. Comput. Biol..
[31]
T. Hughes,et al.
Exploration of Essential Gene Functions via Titratable Promoter Alleles
,
2004,
Cell.
[32]
R. Milo,et al.
Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction.
,
2004,
Proceedings of the National Academy of Sciences of the United States of America.
[33]
Z. Oltvai,et al.
Network biology: understanding the cell's functional organization
,
2004,
Nature Reviews Genetics.
[34]
A. Barabási,et al.
Aggregation of topological motifs in the Escherichia coli transcriptional regulatory network
,
2004,
BMC Bioinformatics.
[35]
R. Milo,et al.
Topological generalizations of network motifs.
,
2003,
Physical review. E, Statistical, nonlinear, and soft matter physics.
[36]
P. Shannon,et al.
Cytoscape: a software environment for integrated models of biomolecular interaction networks.
,
2003,
Genome research.
[37]
Hanah Margalit,et al.
Detection of regulatory circuits by integrating the cellular networks of protein-protein interactions and transcription regulation.
,
2003,
Nucleic acids research.
[38]
S. Shen-Orr,et al.
Network motifs: simple building blocks of complex networks.
,
2002,
Science.
[39]
M. Eisen,et al.
Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering
,
2002,
Genome Biology.
[40]
Benno Schwikowski,et al.
Discovering regulatory and signalling circuits in molecular interaction networks
,
2002,
ISMB.
[41]
S. Shen-Orr,et al.
Network motifs in the transcriptional regulation network of Escherichia coli
,
2002,
Nature Genetics.
[42]
D. Botstein,et al.
Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p.
,
2001,
Molecular biology of the cell.
[43]
Agnieszka Sirko,et al.
A Novel Form of Transcriptional Silencing by Sum1-1 Requires Hst1 and the Origin Recognition Complex
,
2001,
Molecular and Cellular Biology.
[44]
Yudong D. He,et al.
Functional Discovery via a Compendium of Expression Profiles
,
2000,
Cell.
[45]
P. Barré,et al.
Saccharomyces cerevisiae PAU genes are induced by anaerobiosis
,
2000,
Molecular microbiology.
[46]
Michael Ruogu Zhang,et al.
Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.
,
1998,
Molecular biology of the cell.
[47]
P. Blaiseau,et al.
Multiple transcriptional activation complexes tether the yeast activator Met4 to DNA
,
1998,
The EMBO journal.
[48]
Ronald W. Davis,et al.
A genome-wide transcriptional analysis of the mitotic cell cycle.
,
1998,
Molecular cell.
[49]
K. Nasmyth,et al.
The Saccharomyces cerevisiae Start-specific transcription factor Swi4 interacts through the ankyrin repeats with the mitotic Clb2/Cdc28 kinase and through its conserved carboxy terminus with Swi6
,
1996,
Molecular and cellular biology.
[50]
R. Schneggenburger,et al.
A Systems Approach to Mapping DNA Damage Response Pathways
,
2006
.
[51]
S. Shen-Orr,et al.
Networks Network Motifs : Simple Building Blocks of Complex
,
2002
.
[52]
C. Ball,et al.
Saccharomyces Genome Database.
,
2002,
Methods in enzymology.