Effective pre-processing strategies for functional clustering of a protein-protein interactions network

In this article we present novel preprocessing techniques, based on typological measures of the network, to identify clusters of proteins from protein-protein interaction (PPI) networks wherein each cluster corresponds to a group of functionally similar proteins. The two main problems with analyzing protein-protein interaction networks are their scale-free property and the large number of false positive interactions that they contain. Our preprocessing techniques use a key transformation and separate weighting functions to effectively eliminate suspect edges, potential false positives, from the graph. A useful side-effect of this transformation is that the resulting graph is no longer scale free. We then examine the application of two well-known clustering techniques, namely hierarchical and multilevel graph partitioning on the reduced network. We define suitable statistical metrics to evaluate our clusters meaningfully. From our study, we discover that the application of clustering on the pre-processed network results in significantly improved, biologically relevant and balanced clusters when compared with clusters derived from the original network. We strongly believe that our strategies would prove invaluable to future studies on prediction of protein functionality from PPI networks.

[1]  Yoshihide Hayashizaki,et al.  Interaction Generality, a Measurement to Assess the Reliability of a Protein-Protein Interaction , 2002 .

[2]  Steven Skiena,et al.  Implementing discrete mathematics - combinatorics and graph theory with Mathematica , 1990 .

[3]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[4]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[5]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[6]  Lani F. Wu,et al.  Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters , 2002, Nature Genetics.

[7]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[8]  Alain Guénoche,et al.  Clustering proteins from interaction networks for the prediction of cellular functions , 2004, BMC Bioinformatics.

[9]  C. Deane,et al.  Protein Interactions , 2002, Molecular & Cellular Proteomics.

[10]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Zhenzhen Kou,et al.  Finding Motifs in Protein-Protein Interaction Networks , 2003 .

[13]  Linton C. Freeman,et al.  Centered graphs and the structure of ego networks , 1982, Math. Soc. Sci..

[14]  J. C. Nacher,et al.  Two complementary representations of a scale-free network , 2005 .

[15]  C. Cannings,et al.  On the structure of protein-protein interaction networks. , 2003, Biochemical Society transactions.