BackgroundMany recent studies have investigated modularity in biological networks, and its role in functional and structural characterization of constituent biomolecules. A technique that has shown considerable promise in the domain of modularity detection is the Newman and Girvan (NG) algorithm, which relies on the number of shortest-paths across pairs of vertices in the network traversing a given edge, referred to as the betweenness of that edge. The edge with the highest betweenness is iteratively eliminated from the network, with the betweenness of the remaining edges recalculated in every iteration. This generates a complete dendrogram, from which modules are extracted by applying a quality metric called modularity denoted by Q. This exhaustive computation can be prohibitively expensive for large networks such as Protein-Protein Interaction Networks. In this paper, we present a novel optimization to the modularity detection algorithm, in terms of an efficient termination criterion based on a target edge betweenness value, using which the process of iterative edge removal may be terminated.ResultsWe validate the robustness of our approach by applying our algorithm on real-world protein-protein interaction networks of Yeast, C.Elegans and Drosophila, and demonstrate that our algorithm consistently has significant computational gains in terms of reduced runtime, when compared to the NG algorithm. Furthermore, our algorithm produces modules comparable to those from the NG algorithm, qualitatively and quantitatively. We illustrate this using comparison metrics such as module distribution, module membership cardinality, modularity Q, and Jaccard Similarity Coefficient.ConclusionsWe have presented an optimized approach for efficient modularity detection in networks. The intuition driving our approach is the extraction of holistic measures of centrality from graphs, which are representative of inherent modular structure of the underlying network, and the application of those measures to efficiently guide the modularity detection process. We have empirically evaluated our approach in the specific context of real-world large scale biological networks, and have demonstrated significant savings in computational time while maintaining comparable quality of detected modules.
[1]
Peter Eades,et al.
FADE: Graph Drawing, Clustering, and Visual Abstraction
,
2000,
GD.
[2]
A. Arenas,et al.
Community detection in complex networks using extremal optimization.
,
2005,
Physical review. E, Statistical, nonlinear, and soft matter physics.
[3]
Stefano Lonardi,et al.
A parallel edge-betweenness clustering tool for Protein-Protein Interaction networks
,
2007,
Int. J. Data Min. Bioinform..
[4]
Gary D. Bader,et al.
An automated method for finding molecular complexes in large protein interaction networks
,
2003,
BMC Bioinformatics.
[5]
Alexander Rives,et al.
Modular organization of cellular networks
,
2003,
Proceedings of the National Academy of Sciences of the United States of America.
[6]
Roded Sharan,et al.
Identification of protein complexes by comparative analysis of yeast and bacterial protein interaction data
,
2004,
J. Comput. Biol..
[7]
J. Anthonisse.
The rush in a directed graph
,
1971
.
[8]
Frank Dudbridge,et al.
The Use of Edge-Betweenness Clustering to Investigate Biological Function in Protein Interaction Networks
,
2005,
BMC Bioinformatics.
[9]
M E J Newman,et al.
Finding and evaluating community structure in networks.
,
2003,
Physical review. E, Statistical, nonlinear, and soft matter physics.
[10]
Wojciech Szpankowski,et al.
Pairwise Local Alignment of Protein Interaction Networks Guided by Models of Evolution
,
2005,
RECOMB.