Protein Function Prediction by Spectral Clustering of Protein Interaction Network

The increasing availability of large-scale protein-protein interaction (PPI) data has made it possible to understand the basic components and organization of cell machinery from the network level. Many studies have shown that clustering protein interaction network (PIN) is an effective approach for identifying protein complexes or functional modules. A significant number of proteins in such PIN remain uncharacterized and predicting their function remains a major challenge in system biology. We propose a protein annotation method based on spectral clustering, which first transforms the PIN using the normalized Laplacian of the PIN graph, and then employs a classic clustering algorithm like k-means. Protein functions are assigned based on cluster information. Experiments were performed on PPI data from the bakers’ yeast and since the network is noisy and still incomplete, we use pre-processing and purifying. We also performed network weighting based on the annotation correlation between nodes. Results reveal improvement over previous techniques.

[1]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[2]  David L. Robertson,et al.  Specificity in protein interactions and its relationship with sequence diversity and coevolution , 2007, Proceedings of the National Academy of Sciences.

[3]  Julien Gagneur,et al.  Modular decomposition of protein-protein interaction networks , 2004, Genome Biology.

[4]  Xiaogang Wang,et al.  Clustering by common friends finds locally significant proteins mediating modules , 2007, Bioinform..

[5]  Michael Schroeder,et al.  Unraveling Protein Networks with Power Graph Analysis , 2008, PLoS Comput. Biol..

[6]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[7]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[8]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[9]  James A. Casbon,et al.  On single and multiple models of protein families for the detection of remote sequence relationships , 2006, BMC Bioinformatics.

[10]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[12]  Srinivasan Parthasarathy,et al.  An ensemble framework for clustering protein-protein interaction networks , 2007, ISMB/ECCB.

[13]  Desmond J. Higham,et al.  A lock-and-key model for protein-protein interactions , 2006, Bioinform..

[14]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[15]  Christopher C. Moser,et al.  Natural engineering principles of electron tunnelling in biological oxidation–reduction , 1999, Nature.

[16]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[17]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  F. Radicchi,et al.  Benchmark graphs for testing community detection algorithms. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Feng Luo,et al.  Modular organization of protein interaction networks , 2007, Bioinform..

[20]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) , 2002, Nucleic Acids Res..

[21]  Frank Dudbridge,et al.  The Use of Edge-Betweenness Clustering to Investigate Biological Function in Protein Interaction Networks , 2005, BMC Bioinformatics.

[22]  G. Casari,et al.  Method Modular decomposition of protein-protein interaction networks , 2004 .

[23]  Alexander Rives,et al.  Modular organization of cellular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Jingchun Chen,et al.  Detecting functional modules in the yeast protein-protein interaction network , 2006, Bioinform..

[25]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[26]  Caroline C. Friedel,et al.  Inferring topology from clustering coefficients in protein-protein interaction networks , 2006, BMC Bioinformatics.

[27]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[28]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[29]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..