Identifying Protein Complexes in High-Throughput Protein Interaction Screens Using an Infinite Latent Feature Model

We propose a Bayesian approach to identify protein complexes and their constituents from high-throughput protein-protein interaction screens. An infinite latent feature model that allows for multi-complex membership by individual proteins is coupled with a graph diffusion kernel that evaluates the likelihood of two proteins belonging to the same complex. Gibbs sampling is then used to infer a catalog of protein complexes from the interaction screen data. An advantage of this model is that it places no prior constraints on the number of complexes and automatically infers the number of significant complexes from the data. Validation results using affinity purification/mass spectrometry experimental data from yeast RNA-processing complexes indicate that our method is capable of partitioning the data in a biologically meaningful way. A supplementary web site containing larger versions of the figures is available at http://public.kgi.edu/wild/PSBO6/index.html.

[1]  A. Dunker The pacific symposium on biocomputing , 1998 .

[2]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[3]  P. Cramer,et al.  Architecture of RNA polymerase II and implications for the transcription mechanism. , 2000, Science.

[4]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[5]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[7]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[8]  Gary D Bader,et al.  Analyzing yeast protein–protein interaction data obtained from different sources , 2002, Nature Biotechnology.

[9]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[10]  Risi Kondor,et al.  Diffusion kernels on graphs and other discrete structures , 2002, ICML 2002.

[11]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[12]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[13]  A. Barabasi,et al.  Bioinformatics analysis of experimentally determined protein complexes in the yeast Saccharomyces cerevisiae. , 2003, Genome research.

[14]  Christian von Mering,et al.  A comprehensive set of protein complexes in yeast: mining large scale protein-protein interaction screens , 2003, Bioinform..

[15]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[17]  R. Gentleman,et al.  Making Sense of High-Throughput Protein-Protein Interaction Data , 2005, Statistical applications in genetics and molecular biology.

[18]  William Stafford Noble,et al.  Learning kernels from biological networks by maximizing entropy , 2004, ISMB/ECCB.

[19]  T. Hughes,et al.  High-definition macromolecular composition of yeast RNA-processing complexes. , 2004, Molecular cell.

[20]  A. Beyer,et al.  Identification and characterization of protein subcomplexes in yeast , 2005, Proteomics.

[21]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[22]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[23]  A. Emili,et al.  Interaction network containing conserved and essential protein complexes in Escherichia coli , 2005, Nature.