FAST Community Detection for Proteins Graph-Based Functional Classification

In this paper we present and evaluate a fast and parallel method that addresses the problem of similarity assessment between node-labeled and edge-weighted graphs which represent the binding pockets of protein. In order to predict the functional family of proteins, graphs can be used to model binding pockets to depict their geometry and physiochemical composition without information loss. To facilitate the measure of similarity on graphs, community detection can be used. Our approach is based on a parallel implementation of community detection algorithm which is an adaptation and extension of Louvain method. Compared to the existing solutions, our method can achieve nearly well-balanced workload among processors and higher accuracy of graph clustering on real-world large graphs.

[1]  Ernest Valveny,et al.  Median graph: A new exact algorithm using a distance based on the maximum common subgraph , 2009, Pattern Recognit. Lett..

[2]  Yongtang Shi,et al.  Fifty years of graph matching, network alignment and network comparison , 2016, Inf. Sci..

[3]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[4]  Yasuhiro Fujiwara,et al.  Fast Algorithm for Modularity-Based Graph Clustering , 2013, AAAI.

[5]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[6]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[7]  Robert H. Storer,et al.  A Graph-Theoretic Decomposition of the Job Shop Scheduling Problem to Achieve Scheduling Robustness , 1999, Oper. Res..

[8]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[9]  Jian Pei,et al.  Mining frequent cross-graph quasi-cliques , 2009, TKDD.

[10]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[11]  Eyke Hüllermeier,et al.  Similarity Analysis of Protein Binding Sites: A Generalization of the Maximum Common Subgraph Measure Based on Quasi-Clique Detection , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[12]  Paolo Frasconi,et al.  Predicting Metal-Binding Sites from Protein Sequence , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Eyke Hüllermeier,et al.  Local Clique Merging: An Extension of the Maximum Common Subgraph Measure with Applications in Structural Bioinformatics , 2013, Algorithms from and for Nature and Life.

[14]  Zied Elouedi,et al.  Community detection for graph-based similarity: Application to protein binding pockets classification , 2015, Pattern Recognit. Lett..

[15]  Yu Wang,et al.  NXgraph: An efficient graph processing system on a single machine , 2015, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[16]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[17]  G. Klebe,et al.  A new method to detect related function among proteins independent of sequence and fold homology. , 2002, Journal of molecular biology.

[18]  G. Levi A note on the derivation of maximal common subgraphs of two directed or undirected graphs , 1973 .

[19]  J. J. McGregor,et al.  Backtrack search algorithms and the maximal common subgraph problem , 1982, Softw. Pract. Exp..

[20]  Frank Harary,et al.  Graph Theory As A Mathematical Model In Social Science , 1953 .

[21]  Gaganmeet Kaur Awal,et al.  Team formation in social networks based on collective intelligence – an evolutionary approach , 2014, Applied Intelligence.

[22]  G. Klebe,et al.  Multiple Graph Alignment for the Structural Analysis of Protein Active Sites , 2007, TCBB.