A Coclustering Approach for Mining Large Protein-Protein Interaction Networks

Several approaches have been presented in the literature to cluster Protein-Protein Interaction (PPI) networks. They can be grouped in two main categories: those allowing a protein to participate in different clusters and those generating only nonoverlapping clusters. In both cases, a challenging task is to find a suitable compromise between the biological relevance of the results and a comprehensive coverage of the analyzed networks. Indeed, methods returning high accurate results are often able to cover only small parts of the input PPI network, especially when low-characterized networks are considered. We present a coclustering-based technique able to generate both overlapping and nonoverlapping clusters. The density of the clusters to search for can also be set by the user. We tested our method on the two networks of yeast and human, and compared it to other five well-known techniques on the same interaction data sets. The results showed that, for all the examples considered, our approach always reaches a good compromise between accuracy and network coverage. Furthermore, the behavior of our algorithm is not influenced by the structure of the input network, different from all the techniques considered in the comparison, which returned very good results on the yeast network, while on the human network their outcomes are rather poor.

[1]  Blatt,et al.  Superparamagnetic clustering of data. , 1998, Physical review letters.

[2]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[3]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[6]  S. Fields,et al.  A protein interaction map for cell polarity development , 2001, The Journal of cell biology.

[7]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[8]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[9]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[10]  H. Mewes,et al.  Functional modules by relating protein interaction networks and gene expression. , 2003, Nucleic acids research.

[11]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  M. Samanta,et al.  Predicting protein functions from redundancies in large-scale protein interaction networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Alexander Rives,et al.  Modular organization of cellular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[15]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[16]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[17]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[18]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Alain Guénoche,et al.  Clustering proteins from interaction networks for the prediction of cellular functions , 2004, BMC Bioinformatics.

[20]  Aidong Zhang,et al.  A two-step approach for clustering proteins based on protein interaction profile , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[21]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[22]  T. Vicsek,et al.  Clique percolation in random networks. , 2005, Physical review letters.

[23]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[24]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[25]  Takeaki Uno,et al.  Enumeration of condition-dependent dense modules in protein interaction networks , 2009, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[26]  Aidong Zhang,et al.  Clustering Methods in a Protein–Protein Interaction Network , 2007 .

[27]  Shi-Hua Zhang,et al.  A Graph-Theoretic Method for Mining Functional Modules in Large Sparse Protein Interaction Networks , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[28]  Benno Schwikowski,et al.  Graph-based methods for analysing networks in cell biology , 2006, Briefings Bioinform..

[29]  Aidong Zhang,et al.  Identification of Overlapping Functional Modules in Protein Interaction Networks: Information Flow-based Approach , 2006, ICDM Workshops.

[30]  Srinivasan Parthasarathy,et al.  Improving Functional Modularity in Protein-Protein Interactions Graphs Using Hub-Induced Subgraphs , 2006, PKDD.

[31]  Zelmina Lubovac,et al.  Combining functional and topological properties to identify core modules in protein interaction networks , 2006, Proteins.

[32]  Aidong Zhang,et al.  A novel functional module detection algorithm for protein-protein interaction networks , 2006, Algorithms for Molecular Biology.

[33]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[34]  Teresa M. Przytycka,et al.  Decomposition of overlapping protein complexes: A graph theoretical method for analyzing static and dynamic protein associations , 2005, Algorithms for Molecular Biology.

[35]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[36]  A. Ruttenberg,et al.  Edge‐count probabilities for the identification of local protein communities and their organization , 2005, Proteins.

[37]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[38]  Aidong Zhang,et al.  Semantic integration to identify overlapping functional modules in protein interaction networks , 2007, BMC Bioinformatics.

[39]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[40]  Clara Pizzuti,et al.  PINCoC : A Co-clustering Based Approach to Analyze Protein-Protein Interaction Networks , 2007, IDEAL.

[41]  Maria Victoria Schneider,et al.  MINT: a Molecular INTeraction database. , 2002, FEBS letters.

[42]  Gang Chen,et al.  Modifying the DPClus algorithm for identifying protein complexes based on new topological structures , 2008, BMC Bioinformatics.

[43]  Clara Pizzuti,et al.  Multi-functional Protein Clustering in PPI Networks , 2008, BIRD.

[44]  Clara Pizzuti,et al.  Discovering Protein Complexes in Protein Interaction Networks , 2009 .

[45]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2009 update , 2009, Nucleic Acids Res..