Functional module identification in protein interaction networks by interaction patterns

MOTIVATION Identifying functional modules in protein-protein interaction (PPI) networks may shed light on cellular functional organization and thereafter underlying cellular mechanisms. Many existing module identification algorithms aim to detect densely connected groups of proteins as potential modules. However, based on this simple topological criterion of 'higher than expected connectivity', those algorithms may miss biologically meaningful modules of functional significance, in which proteins have similar interaction patterns to other proteins in networks but may not be densely connected to each other. A few blockmodel module identification algorithms have been proposed to address the problem but the lack of global optimum guarantee and the prohibitive computational complexity have been the bottleneck of their applications in real-world large-scale PPI networks. RESULTS In this article, we propose a novel optimization formulation LCP(2) (low two-hop conductance sets) using the concept of Markov random walk on graphs, which enables simultaneous identification of both dense and sparse modules based on protein interaction patterns in given networks through searching for LCP(2) by random walk. A spectral approximate algorithm SLCP(2) is derived to identify non-overlapping functional modules. Based on a bottom-up greedy strategy, we further extend LCP(2) to a new algorithm (greedy algorithm for LCP(2)) GLCP(2) to identify overlapping functional modules. We compare SLCP(2) and GLCP(2) with a range of state-of-the-art algorithms on synthetic networks and real-world PPI networks. The performance evaluation based on several criteria with respect to protein complex prediction, high level Gene Ontology term prediction and especially sparse module detection, has demonstrated that our algorithms based on searching for LCP(2) outperform all other compared algorithms. AVAILABILITY AND IMPLEMENTATION All data and code are available at http://www.cse.usf.edu/~xqian/fmi/slcp2hop/.

[1]  S. Fields,et al.  Protein-protein interactions: methods for detection and analysis , 1995, Microbiological reviews.

[2]  Dean P. Jones,et al.  Prevention of Apoptosis by Bcl-2: Release of Cytochrome c from Mitochondria Blocked , 1997, Science.

[3]  S. Dongen A cluster algorithm for graphs , 2000 .

[4]  C. Powers,et al.  Fibroblast growth factors, their receptors and signaling. , 2000, Endocrine-related cancer.

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[7]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[8]  K. Sneppen,et al.  Specificity and Stability in Topology of Protein Networks , 2002, Science.

[9]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[10]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[14]  Desmond J. Higham,et al.  A lock-and-key model for protein-protein interactions , 2006, Bioinform..

[15]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[16]  Octave Noubibou Doudieu,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[17]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2008, Nucleic Acids Res..

[18]  Kara Dolinski,et al.  Gene Ontology annotations at SGD: new data sources and annotation methods , 2007, Nucleic Acids Res..

[19]  J. Reichardt,et al.  Structure in Complex Networks , 2008 .

[20]  Chris H Wiggins,et al.  Bayesian approach to network modularity. , 2007, Physical review letters.

[21]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[22]  Michael Schroeder,et al.  Unraveling Protein Networks with Power Graph Analysis , 2008, PLoS Comput. Biol..

[23]  Michael C. Schatz,et al.  Revealing Biological Modules via Graph Summarization , 2009, J. Comput. Biol..

[24]  Andrea Lancichinetti,et al.  Detecting the overlapping and hierarchical community structure in complex networks , 2008, 0802.1218.

[25]  Shang-Hua Teng,et al.  Finding local communities in protein networks , 2009, BMC Bioinformatics.

[26]  Srinivasan Parthasarathy,et al.  Scalable graph clustering using stochastic flows: applications to community discovery , 2009, KDD.

[27]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[28]  Karthik Raman,et al.  Construction and analysis of protein–protein interaction networks , 2010, Automated experimentation.

[29]  Hans-Werner Mewes,et al.  CORUM: the comprehensive resource of mammalian protein complexes , 2007, Nucleic Acids Res..

[30]  Xiaoli Li,et al.  Computational approaches for detecting protein complexes from protein interaction networks: a survey , 2010, BMC Genomics.

[31]  Jörg Schultz,et al.  Protein Interaction Networks—More Than Mere Modules , 2008, PLoS Comput. Biol..

[32]  Javier De Las Rivas,et al.  Protein–Protein Interactions Essentials: Key Concepts to Building and Analyzing Interactome Networks , 2010, PLoS Comput. Biol..

[33]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[34]  Srinivasan Parthasarathy,et al.  Markov clustering of protein interaction networks with improved balance and scalability , 2010, BCB '10.

[35]  Srinivasan Parthasarathy,et al.  Symmetrizations for clustering directed graphs , 2011, EDBT/ICDT '11.

[36]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[37]  Srinivasan Parthasarathy,et al.  Identifying functional modules in interaction networks through overlapping Markov clustering , 2012, Bioinform..

[38]  Katsuhiko Murakami,et al.  PCDq: human protein complex database with quality index which summarizes different levels of evidences of protein complexes predicted from H-Invitational protein-protein interactions integrative dataset , 2012, BMC Systems Biology.

[39]  Yijie Wang,et al.  Functional module identification by block modeling using simulated annealing with path relinking , 2012, BCB.

[40]  Yijie Wang,et al.  A novel subgradient-based optimization algorithm for blockmodel functional module identification , 2013, BMC Bioinformatics.

[41]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[42]  R. Tsien,et al.  Specificity and Stability in Topology of Protein Networks , 2022 .