Detection of Complexes in Biological Networks Through Diversified Dense Subgraph Mining

Protein-protein interaction (PPI) networks, providing a comprehensive landscape of protein interaction patterns, enable us to explore biological processes and cellular components at multiple resolutions. For a biological process, a number of proteins need to work together to perform a job. Proteins densely interact with each other, forming large molecular machines or cellular building blocks. Identification of such densely interconnected clusters or protein complexes from PPI networks enables us to obtain a better understanding of the hierarchy and organization of biological processes and cellular components. However, most existing graph clustering algorithms on PPI networks often cannot effectively detect densely connected subgraphs and overlapped subgraphs. In this article, we formulate the problem of complex detection as diversified dense subgraph mining and introduce a novel approximation algorithm to efficiently enumerate putative protein complexes from biological networks. The key insight of our algorithm is that instead of enumerating all dense subgraphs, we only need to find a small diverse subset of subgraphs that cover as many proteins as possible. The problem is modeled as finding a diverse set of maximal dense subgraphs where we develop highly effective pruning techniques to guarantee efficiency. To scale up to large networks, we devise a divide-and-conquer approach to speed up the algorithm in a distributed manner. By comparing with existing clustering and dense subgraph-based algorithms on several yeast and human PPI networks, we demonstrate that our method can detect more putative protein complexes and achieves better prediction accuracy.

[1]  Francesco Bonchi,et al.  Finding Subgraphs with Maximum Total Density and Limited Overlap , 2015, WSDM.

[2]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[3]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[4]  Charalampos E. Tsourakakis,et al.  Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees , 2013, KDD.

[5]  David Eppstein,et al.  Listing All Maximal Cliques in Sparse Graphs in Near-optimal Time , 2010, Exact Complexity of NP-hard Problems.

[6]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[7]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[8]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[9]  S. Dongen Graph clustering by flow simulation , 2000 .

[10]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[11]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[13]  Sandra Sudarsky,et al.  Massive Quasi-Clique Detection , 2002, LATIN.

[14]  Sergei Vassilvitskii,et al.  Densest Subgraph in Streaming and MapReduce , 2012, Proc. VLDB Endow..

[15]  Akira Tanaka,et al.  The Worst-Case Time Complexity for Generating All Maximal Cliques , 2004, COCOON.

[16]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[17]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[18]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[19]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Hisao Tamaki,et al.  Greedily Finding a Dense Subgraph , 2000, J. Algorithms.

[21]  Andreas Krause,et al.  Streaming submodular maximization: massive data summarization on the fly , 2014, KDD.

[22]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[23]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[24]  Takeaki Uno,et al.  An Efficient Algorithm for Solving Pseudo Clique Enumeration Problem , 2008, Algorithmica.

[25]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[26]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[27]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[28]  Jia Wang,et al.  Redundancy-aware maximal cliques , 2013, KDD.

[29]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[30]  Xing-yuan Wang,et al.  Uncovering the overlapping community structure of complex networks by maximal cliques , 2014 .

[31]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[32]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.