Bootstrapping the Interactome: Unsupervised Identification of Protein Complexes in Yeast

Protein interactions and complexes are important components of biological systems. Recently, two genome-wide applications of tandem affinity purification (TAP) in yeast have increased significantly the available information on interactions in complexes. Several approaches have been developed to predict protein complexes from these measurements, which generally depend heavily on additional training data in the form of known complexes. In this article, we present an unsupervised algorithm for the identification of protein complexes which is independent of the availability of such additional complex information. Based on a Bootstrap approach, we calculate intuitive confidence scores for interactions more accurate than all other published scoring methods and predict complexes with the same quality as the best supervised predictions. Although there are considerable differences between the Bootstrap and the best published predictions, the set of consistently identified complexes is more than four times as large as for complexes derived from one data set only. Our results illustrate that meaningful and reliable complexes can be determined from the purification experiments alone. As a consequence, the approach presented in this article is easily applicable to large-scale TAP experiments for any species even if few complexes are already known.

[1]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[2]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[3]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[4]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[5]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[6]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[7]  P. Bork,et al.  Structure-Based Assembly of Protein Complexes in Yeast , 2004, Science.

[8]  Kara Dolinski,et al.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) , 2002, Nucleic Acids Res..

[9]  S. vanDongen Graph Clustering by Flow Simulation , 2000 .

[10]  Caroline C. Friedel,et al.  ProCope - protein complex prediction and evaluation , 2008, Bioinform..

[11]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[12]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[13]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[14]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[15]  S. L. Wong,et al.  A Map of the Interactome Network of the Metazoan C. elegans , 2004, Science.

[16]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[17]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[18]  K. Ahmed,et al.  Joining the cell survival squad: an emerging role for protein kinase CK2. , 2002, Trends in cell biology.

[19]  Alexander Schliep,et al.  Identifying protein complexes directly from high-throughput TAP data with Markov random fields , 2007, BMC Bioinformatics.

[20]  Robert Gentleman,et al.  Local modeling of global interactome networks , 2005 .

[21]  M. Vignali,et al.  A protein interaction network of the malaria parasite Plasmodium falciparum , 2005, Nature.

[22]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[23]  James R. Knight,et al.  A Protein Interaction Map of Drosophila melanogaster , 2003, Science.

[24]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[25]  Andrew Emili,et al.  Identifying functional modules in the physical interactome of Saccharomyces cerevisiae , 2007, Proteomics.

[26]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[27]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[28]  Gary D Bader,et al.  Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[29]  Nagiza F. Samatova,et al.  From pull-down data to protein interaction networks and complexes with biological relevance. , 2008, Bioinformatics.

[30]  Insuk Lee,et al.  A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality , 2007, BMC Bioinformatics.

[31]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[32]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[33]  M Wilm,et al.  The S. cerevisiae SET3 complex includes two histone deacetylases, Hos2 and Hst1, and is a meiotic-specific repressor of the sporulation gene program. , 2001, Genes & development.

[34]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[35]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[36]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of proteins from whole genomes in 2005 , 2006, Nucleic Acids Res..

[37]  G. Cagney,et al.  RNA Polymerase II Elongation Factors of Saccharomyces cerevisiae: a Targeted Proteomics Approach , 2002, Molecular and Cellular Biology.

[38]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.