Identification of protein complexes from co-immunoprecipitation data

Motivation: Advanced technologies are producing large-scale protein–protein interaction data at an ever increasing pace. A fundamental challenge in analyzing these data is the inference of protein machineries. Previous methods for detecting protein complexes have been mainly based on analyzing binary protein–protein interaction data, ignoring the more involved co-complex relations obtained from co-immunoprecipitation experiments. Results: Here, we devise a novel framework for protein complex detection from co-immunoprecipitation data. The framework aims at identifying sets of preys that significantly co-associate with the same set of baits. In application to an array of datasets from yeast, our method identifies thousands of protein complexes. Comparing these complexes to manually curated ones, we show that our method attains very high specificity and sensitivity levels (∼ 80%), outperforming current approaches for protein complex inference. Availability: Supplementary information and the program are available at http://www.cs.tau.ac.il/~roded/CODEC/main.html. Contact: roded@post.tau.ac.il Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[2]  Caroline C. Friedel,et al.  Bootstrapping the Interactome: Unsupervised Identification of Protein Complexes in Yeast , 2008, J. Comput. Biol..

[3]  Robert Gentleman,et al.  Local modeling of global interactome networks , 2005 .

[4]  Andrew Emili,et al.  Identifying functional modules in the physical interactome of Saccharomyces cerevisiae , 2007, Proteomics.

[5]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[6]  K. Sneppen,et al.  Specificity and Stability in Topology of Protein Networks , 2002, Science.

[7]  B. Alberts The Cell as a Collection of Protein Machines: Preparing the Next Generation of Molecular Biologists , 1998, Cell.

[8]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[9]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[10]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[11]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[12]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[13]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[14]  Gary D Bader,et al.  Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[15]  Insuk Lee,et al.  A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality , 2007, BMC Bioinformatics.

[16]  Alexander Schliep,et al.  Identifying protein complexes directly from high-throughput TAP data with Markov random fields , 2007, BMC Bioinformatics.

[17]  Hilla Peretz,et al.  The , 1966 .

[18]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[19]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[20]  Nagiza F. Samatova,et al.  From pull-down data to protein interaction networks and complexes with biological relevance. , 2008, Bioinformatics.

[21]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[22]  M. Newman,et al.  On the uniform generation of random graphs with prescribed degree sequences , 2003, cond-mat/0312028.

[23]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[24]  M. Mann,et al.  Analysis of proteins and proteomes by mass spectrometry. , 2001, Annual review of biochemistry.

[25]  M. Coffin,et al.  Receiver operating characteristic studies and measurement errors. , 1997, Biometrics.

[26]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[27]  R. Milo,et al.  Subgraphs in random networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.