A Fast Hierarchical Clustering Algorithm for Functional Modules Discovery in Protein Interaction Networks

As advances in the technologies of predicting protein interactions, huge data sets portrayed as networks have been available. Identification of functional modules from such networks is crucial for understanding principles of cellular organization and functions. However, protein interaction data produced by high-throughput experiments are generally associated with high false positives, which makes it difficult to identify functional modules accurately. In this paper, we propose a fast hierarchical clustering algorithm HC-PIN based on the local metric of edge clustering value which can be used both in the unweighted network and in the weighted network. The proposed algorithm HC-PIN is applied to the yeast protein interaction network, and the identified modules are validated by all the three types of Gene Ontology (GO) Terms: Biological Process, Molecular Function, and Cellular Component. The experimental results show that HC-PIN is not only robust to false positives, but also can discover the functional modules with low density. The identified modules are statistically significant in terms of three types of GO annotations. Moreover, HC-PIN can uncover the hierarchical organization of functional modules with the variation of its parameter's value, which is approximatively corresponding to the hierarchical structure of GO annotations. Compared to other previous competing algorithms, our algorithm HC-PIN is faster and more accurate.

[1]  C. Moore,et al.  Five subunits are required for reconstitution of the cleavage and polyadenylation activities of Saccharomyces cerevisiae cleavage factor I , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[3]  Srinivasan Parthasarathy,et al.  Effective pre-processing strategies for functional clustering of a protein-protein interactions network , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[4]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[5]  Hui Xiong,et al.  Identification of Functional Modules in Protein Complexes via Hyperclique Pattern Discovery , 2004, Pacific Symposium on Biocomputing.

[6]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[7]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[8]  Jianer Chen,et al.  A Fast Agglomerate Algorithm for Mining Functional Modules in Protein Interaction Networks , 2008, 2008 International Conference on BioMedical Engineering and Informatics.

[9]  Siu-Ming Yiu,et al.  Predicting Protein Complexes from PPI Data: A Core-Attachment Approach , 2009, J. Comput. Biol..

[10]  Aidong Zhang,et al.  A “Seed-Refine” Algorithm for Detecting Protein Complexes From Protein Interaction Data , 2007, IEEE Transactions on NanoBioscience.

[11]  Caroline C. Friedel,et al.  Inferring topology from clustering coefficients in protein-protein interaction networks , 2006, BMC Bioinformatics.

[12]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[13]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[14]  Alain Guénoche,et al.  Clustering proteins from interaction networks for the prediction of cellular functions , 2004, BMC Bioinformatics.

[15]  J R Yates,et al.  Identification and characterization of five new subunits of TRAPP. , 2000, European journal of cell biology.

[16]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[17]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[18]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[19]  Alexander Rives,et al.  Modular organization of cellular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  S. Dongen Graph clustering by flow simulation , 2000 .

[21]  Gang Chen,et al.  Modifying the DPClus algorithm for identifying protein complexes based on new topological structures , 2008, BMC Bioinformatics.

[22]  Chris Ding,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. , 2007 .

[23]  D. Bu,et al.  the protein–protein interaction network , 2004 .

[24]  Roded Sharan,et al.  QPath: a method for querying pathways in a protein-protein interaction network , 2006, BMC Bioinformatics.

[25]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[26]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[27]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[28]  Claudio Castellano,et al.  Defining and identifying communities in networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Marco M. Kessler,et al.  Cleavage Factor II of Saccharomyces cerevisiaeContains Homologues to Subunits of the Mammalian Cleavage/ Polyadenylation Specificity Factor and Exhibits Sequence-specific, ATP-dependent Interaction with Precursor RNA* , 1997, The Journal of Biological Chemistry.

[30]  See-Kiong Ng,et al.  Interaction graph mining for protein complexes using local clique merging. , 2005, Genome informatics. International Conference on Genome Informatics.

[31]  Aidong Zhang,et al.  A novel functional module detection algorithm for protein-protein interaction networks , 2006, Algorithms for Molecular Biology.

[32]  Scott D. Emr,et al.  A Membrane Coat Complex Essential for Endosome-to-Golgi Retrograde Transport in Yeast , 1998, The Journal of cell biology.

[33]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[34]  Aidong Zhang,et al.  Semantic integration to identify overlapping functional modules in protein interaction networks , 2007, BMC Bioinformatics.

[35]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[36]  Shoshana J. Wodak,et al.  Markov clustering versus affinity propagation for the partitioning of protein interaction graphs , 2009, BMC Bioinformatics.

[37]  A. Barabasi,et al.  Functional and topological characterization of protein interaction networks , 2004, Proteomics.

[38]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[39]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[40]  Srinivasan Parthasarathy,et al.  Improving Functional Modularity in Protein-Protein Interactions Graphs Using Hub-Induced Subgraphs , 2006, PKDD.

[41]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[42]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Yi Pan,et al.  Hierarchical Organization of Functional Modules in Weighted Protein Interaction Networks Using Clustering Coefficient , 2009, ISBRA.

[44]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[45]  Feng Luo,et al.  Modular organization of protein interaction networks , 2007, Bioinform..

[46]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[47]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[48]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..