The Flag-based Algorithm - A Novel Greedy Method that Optimizes Protein Communities Detection

Proteins and the networks they determine, called interactome networks, have received attention at an important degree during the last years, because they have been discovered to have an influence on some complex biological phenomena, such as problematic disorders like cancer. This paper presents a contribution that aims to optimize the detection of protein communities through a greedy algorithm that is implemented in the C programming language. The optimization involves a double improvement in relation to protein communities detection, which is accomplished both at the algorithmic and programming level. The resulting implementation’s performance was carefully tested on real biological data and the results acknowledge the relevant speedup that the optimization determines. Moreover, the results are in line with the previous findings that our current research produced, as it reveals and confirms the existence of some important properties of those proteins that participate in the carcinogenesis process. Apart from being particularly useful for research purposes, the novel community detection algorithm also dramatically speeds up the proteomic databases analysis process, as compared to some other sequential community detection approaches, and also to the sequential algorithm of Newman and Girvan.

[1]  Sabin Tabirca,et al.  Sparse Networks-Based Speedup Technique for Proteins Betweenness Centrality Computation , 2009 .

[2]  Santo Fortunato,et al.  Is the intrinsic disorder of proteins the cause of the scale‐free architecture of protein–protein interaction networks? , 2006, Proteomics.

[3]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[5]  Ulrik Brandes,et al.  Efficient generation of large random networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Arun K. Ramani,et al.  How complete are current yeast and human protein-interaction networks? , 2006, Genome Biology.

[7]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[9]  Razvan C. Bunescu,et al.  Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome , 2005, Genome Biology.

[10]  Michael Kaufmann,et al.  Decentralized algorithms for evaluating centrality in complex networks , 2003 .

[11]  Shinichiro Wachi,et al.  Interactome-transcriptome analysis reveals the high centrality of genes differentially expressed in lung cancer tissues , 2005, Bioinform..

[12]  Sabin Tabirca,et al.  Proteomic Data Analysis Optimization Using a Parallel MPI C Approach , 2010, 2010 International Conference on Biosciences.

[13]  Paul A. Bates,et al.  Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis , 2006, BMC Bioinformatics.

[14]  Srinivasan Parthasarathy,et al.  Improving Functional Modularity in Protein-Protein Interactions Graphs Using Hub-Induced Subgraphs , 2006, PKDD.

[15]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[16]  Bruno R. Preiss,et al.  Data Structures and Algorithms with Object-Oriented Design Patterns in Java , 1999 .

[17]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[18]  Antoine Dutot,et al.  Detecting Community Structure in Amino Acid Interaction Networks , 2009 .

[19]  Paul A. Bates,et al.  Global topological features of cancer proteins in the human interactome , 2006, Bioinform..

[20]  Kyongbum Lee,et al.  An algorithm for modularity analysis of directed and weighted biological networks based on edge-betweenness centrality , 2006, Bioinform..