Detecting protein complexes from noisy protein interaction data

High-throughput experimental techniques have made available large datasets of experimentally detected protein-protein interactions. However, experimentally determined protein complexes datasets are not exhaustive nor reliable. A protein complex plays a key role in disease development. Therefore, the identification and characterization of protein complexes involved is crucial to the understanding of the molecular events under normal and abnormal physiological conditions. In this paper, we propose a novel graph mining algorithm to identify protein complexes. The algorithm first checks the quality of the interaction data, then predicts protein complexes based on the concept of weighted clustering coefficient. To demonstrate the effectiveness of our proposed method, we present experimental results on yeast protein interaction data. The level of accuracy achieved is a strong argument in favor of the proposed method. Novel protein complexes were also predicted to assist biologists in their search for protein complexes. The datasets and programs are freely available from http://faculty.uaeu.ac.ae/nzaki/PE-WCC.htm.

[1]  Nazar Zaki,et al.  Protein-protein interaction based on pairwise similarity , 2009, BMC Bioinformatics.

[2]  P. Bork,et al.  Structure-Based Assembly of Protein Complexes in Yeast , 2004, Science.

[3]  N. Zaki,et al.  Detection of protein complexes using a protein ranking algorithm , 2012, Proteins.

[4]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[5]  David Martin,et al.  Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network , 2003, Genome Biology.

[6]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[7]  Ting Chen,et al.  Assessment of the reliability of protein-protein interactions and protein function prediction , 2002, Pacific Symposium on Biocomputing.

[8]  B. Séraphin,et al.  A generic protein purification method for protein complex characterization and proteome exploration , 1999, Nature Biotechnology.

[9]  Caroline C. Friedel,et al.  Bootstrapping the Interactome: Unsupervised Identification of Protein Complexes in Yeast , 2008, J. Comput. Biol..

[10]  Roded Sharan,et al.  Identification of protein complexes from co-immunoprecipitation data , 2011, Bioinform..

[11]  Yoshihide Hayashizaki,et al.  Construction of reliable protein-protein interaction networks with a new interaction generality measure , 2003, Bioinform..

[12]  Desmond J. Higham,et al.  Geometric De-noising of Protein-Protein Interaction Networks , 2009, PLoS Comput. Biol..

[13]  Siu-Ming Yiu,et al.  Predicting Protein Complexes from PPI Data: A Core-Attachment Approach , 2009, J. Comput. Biol..

[14]  Limsoon Wong,et al.  Exploiting indirect neighbours and topological weight to predict protein function from protein--protein interactions , 2006 .

[15]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[16]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[17]  Limsoon Wong,et al.  Using Indirect protein-protein Interactions for protein Complex Prediction , 2008, J. Bioinform. Comput. Biol..

[18]  S. Dongen Graph clustering by flow simulation , 2000 .

[19]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[20]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[21]  Nazar Zaki,et al.  ProRank: a method for detecting protein complexes , 2012, GECCO '12.

[22]  Andreas Wagner,et al.  A statistical framework for combining and interpreting proteomic datasets , 2004, Bioinform..

[23]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2005, Nucleic Acids Res..

[24]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[25]  Illés J. Farkas,et al.  CFinder: locating cliques and overlapping modules in biological networks , 2006, Bioinform..

[26]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[27]  Limsoon Wong,et al.  Using indirect protein-protein interactions for protein complex predication. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[28]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[29]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[30]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[32]  Haruki Nakamura,et al.  Filtering high-throughput protein-protein interaction data using a combination of genomic features , 2005, BMC Bioinformatics.

[33]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[34]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[35]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[36]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[37]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[38]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[39]  Chee Keong Kwoh,et al.  Construction of co-complex score matrix for protein complex prediction from AP-MS data , 2011, Bioinform..

[40]  D. Goldberg,et al.  Assessing experimentally derived interactions in a small world , 2003, Proceedings of the National Academy of Sciences of the United States of America.