Sensitive and Specific Identification of Protein Complexes in "Perturbed" Protein Interaction Networks from Noisy Pull-Down Data

High-throughput mass-spectrometry technology has enabled genome-scale discovery of protein-protein interactions. Yet, computational inference of protein interaction networks and their functional modules from large-scale pull-down data is challenging. Over-expressed or "sticky" bait is not specific, it generates numerous false positives. This "curse" of the technique is also its "blessing" -- the sticky bait can pull-down interacting components of other complexes, thus increase sensitivity. Finding optimal trade-offs between coverage and accuracy requires tuning multiple "knobs," i.e., method parameters. Each selection leads to a putative network, where each network in the set of "perturbed" networks differs from the others by a few added or removed edges. Identification of functional modules in such networks is often based on graph-theoretical methods such as maximal clique enumeration. Due to the NP-hard nature of the latter, the number of tunings to explore is limited. This paper presents an efficient iterative framework for sensitive and specific detection of protein complexes from noisy protein interaction data.

[1]  W. McDonald,et al.  MS2Grouper: Group assessment and synthetic replacement of duplicate proteomic tandem mass spectra , 2005, Journal of the American Society for Mass Spectrometry.

[2]  T. Edgren,et al.  The fixABCX Genes in Rhodospirillum rubrum Encode a Putative Membrane Complex Participating in Electron Transfer to Nitrogenase , 2004, Journal of bacteriology.

[3]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[4]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[5]  S. Beale,et al.  Anaerobic protoporphyrin biosynthesis does not require incorporation of methyl groups from methionine , 1995, Journal of bacteriology.

[6]  Nagiza F. Samatova,et al.  Incremental all pairs similarity search for varying similarity thresholds , 2009, SNA-KDD '09.

[7]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[8]  J. Mairesse,et al.  The pimFABCDE operon from Rhodopseudomonas palustris mediates dicarboxylic acid degradation and participates in anaerobic benzoate degradation. , 2005, Microbiology.

[9]  Nagiza F. Samatova,et al.  From pull-down data to protein interaction networks and complexes with biological relevance. , 2008, Bioinformatics.

[10]  D C White,et al.  Polyphasic taxonomy of the genus Shewanella and description of Shewanella oneidensis sp. nov. , 1999, International journal of systematic bacteriology.

[11]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[12]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[13]  D. Goldberg,et al.  Assessing experimentally derived interactions in a small world , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Benno Schwikowski,et al.  Graph-based methods for analysing networks in cell biology , 2006, Briefings Bioinform..

[15]  P. Uetz,et al.  The elusive yeast interactome , 2006, Genome Biology.

[16]  Nagiza F. Samatova,et al.  A scalable, parallel algorithm for maximal clique enumeration , 2009, J. Parallel Distributed Comput..

[17]  Nagiza F. Samatova,et al.  On perturbation theory and an algorithm for maximal clique enumeration in uncertain and noisy graphs , 2009, U '09.

[18]  B. Séraphin,et al.  The tandem affinity purification (TAP) method: a general procedure of protein complex purification. , 2001, Methods.

[19]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases , 2007, PLoS Comput. Biol..

[20]  T. Morris,et al.  Lipoic acid metabolism in Escherichia coli: the lplA and lipB genes define redundant pathways for ligation of lipoyl groups to apoprotein , 1995, Journal of bacteriology.

[21]  J. Moon,et al.  On cliques in graphs , 1965 .

[22]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[23]  Bertrand Séraphin,et al.  Recent developments in the analysis of protein complexes 1 , 2004, FEBS letters.

[24]  Ignacio Marín,et al.  Iterative Cluster Analysis of Protein Interaction Data , 2005, Bioinform..

[25]  Nagiza F. Samatova,et al.  Theoretical underpinnings for maximal clique enumeration on perturbed graphs , 2010, Theor. Comput. Sci..

[26]  K. Downard,et al.  Ions of the interactome: The role of MS in the study of protein interactions in proteomics and structural biology , 2006, Proteomics.