BackgroundFinding the dominant direction of flow of information in densely interconnected regulatory or signaling networks is required in many applications in computational biology and neuroscience. This is achieved by first identifying and removing links which close up feedback loops in the original network and hierarchically arranging nodes in the remaining network. In mathematical language this corresponds to a problem of making a graph acyclic by removing as few links as possible and thus altering the original graph in the least possible way. The exact solution of this problem requires enumeration of all cycles and combinations of removed links, which, as an NP-hard problem, is computationally prohibitive even for modest-size networks.ResultsWe introduce and compare two approximate numerical algorithms for solving this problem: the probabilistic one based on a simulated annealing of the hierarchical layout of the network which minimizes the number of "backward" links going from lower to higher hierarchical levels, and the deterministic, "greedy" algorithm that sequentially cuts the links that participate in the largest number of feedback cycles. We find that the annealing algorithm outperforms the deterministic one in terms of speed, memory requirement, and the actual number of removed links. To further improve a visual perception of the layout produced by the annealing algorithm, we perform an additional minimization of the length of hierarchical links while keeping the number of anti-hierarchical links at their minimum. The annealing algorithm is then tested on several examples of regulatory and signaling networks/pathways operating in human cells.ConclusionThe proposed annealing algorithm is powerful enough to performs often optimal layouts of protein networks in whole organisms, consisting of around ~104 nodes and ~105 links, while the applicability of the greedy algorithm is limited to individual pathways with ~100 vertices. The considered examples indicate that the annealing algorithm produce biologically meaningful layouts: The function of the most of the anti-hierarchical links is indeed to send a feedback signal to the upstream pathway elements. Source codes of F 90 and Matlab implementation of the two algorithms are available at http://www.cmth.bnl.gov/~maslov/programs.htm
[1]
Raymond E. Miller,et al.
Complexity of Computer Computations
,
1972
.
[2]
Hanno Steen,et al.
Development of human protein reference database as an initial platform for approaching systems biology in humans.
,
2003,
Genome research.
[3]
Panos M. Pardalos,et al.
Feedback Set Problems
,
2009,
Encyclopedia of Optimization.
[4]
Richard M. Karp,et al.
Reducibility Among Combinatorial Problems
,
1972,
50 Years of Integer Programming.
[5]
Xuemin Lin,et al.
A Fast and Effective Heuristic for the Feedback Arc Set Problem
,
1993,
Inf. Process. Lett..
[6]
Frank Thomson Leighton,et al.
An approximate max-flow min-cut theorem for uniform multicommodity flow problems with applications to approximation algorithms
,
1988,
[Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.
[7]
Sergei Egorov,et al.
Pathway studio - the analysis and navigation of molecular networks
,
2003,
Bioinform..
[8]
Michael A. Bender,et al.
Testing properties of directed graphs: acyclicity and connectivity
,
2002,
Random Struct. Algorithms.
[9]
Joseph Naor,et al.
Approximating Minimum Feedback Sets and Multicuts in Directed Graphs
,
1998,
Algorithmica.