Towards an Integrated Protein-Protein Interaction Network

Protein-protein interactions play a major role in most cellular processes. Thus, the challenge of identifying the full repertoire of interacting proteins in the cell is of great importance, and has been addressed both experimentally and computationally. Today, large scale experimental studies of interacting proteins, while partial and noisy, allow us to characterize properties of interacting proteins and develop predictive algorithms. Most existing algorithms, however, ignore possible dependencies between interacting pairs, and predict them independently of one another. In this study, we present a computational approach that overcomes this drawback by predicting protein-protein interactions simultaneously. In addition, our approach allows us to integrate various protein attributes and explicitly account for uncertainty of assay measurements. Using the language of relational Markov Random Fields, we build a unified probabilistic model that includes all of these elements. We show how we can learn our model properties efficiently and then use it to predict all unobserved interactions simultaneously. Our results show that by modeling dependencies between interactions, as well as by taking into account protein attributes and measurement noise, we achieve a more accurate description of the protein interaction network. Furthermore, our approach allows us to gain new insights into the properties of interacting proteins.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[3]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[4]  Daniel Kahneman,et al.  Probabilistic reasoning , 1993 .

[5]  Wray L. Buntine Chain graphs for learning , 1995, UAI.

[6]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[7]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[9]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[10]  Dmitrij Frishman,et al.  MIPS: a database for protein sequences and complete genomes , 1998, Nucleic Acids Res..

[11]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[12]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[14]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[15]  E. Marcotte,et al.  A fast algorithm for genome‐wide analysis of proteins with repeated sequences , 1999, Proteins.

[16]  B. Séraphin,et al.  A generic protein purification method for protein complex characterization and proteome exploration , 1999, Nature Biotechnology.

[17]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[18]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[19]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 2000, Nucleic Acids Res..

[20]  E. Sprinzak,et al.  Correlated sequence-signatures as markers of protein-protein interaction. , 2001, Journal of molecular biology.

[21]  Ben Taskar,et al.  Learning Probabilistic Models of Relational Structure , 2001, ICML.

[22]  J. E. Kranz,et al.  YPD, PombePD and WormPD: model organism volumes of the BioKnowledge library, an integrated resource for protein information. , 2001, Nucleic acids research.

[23]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[24]  Marek S. Skrzypek,et al.  YPDTM, PombePDTM and WormPDTM: model organism volumes of the BioKnowledgeTM Library, an integrated resource for protein information , 2001, Nucleic Acids Res..

[25]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[26]  M. Gerstein,et al.  Subcellular localization of the yeast proteome. , 2002, Genes & development.

[27]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[28]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[29]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[30]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[31]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[32]  E. O’Shea,et al.  Global analysis of protein expression in yeast , 2003, Nature.

[33]  Ting Chen,et al.  An integrated probabilistic model for functional prediction of proteins , 2003, RECOMB '03.

[34]  Stanley Letovsky,et al.  Predicting protein function from protein/protein interaction data: a probabilistic approach , 2003, ISMB.

[35]  E. O’Shea,et al.  Global analysis of protein localization in budding yeast , 2003, Nature.

[36]  Shmuel Sattath,et al.  How reliable are experimental protein-protein interaction data? , 2003, Journal of molecular biology.

[37]  Ben Taskar,et al.  Link Prediction in Relational Data , 2003, NIPS.

[38]  Frederick P. Roth,et al.  Predicting co-complexed protein pairs using genomic and proteomic data integration , 2004, BMC Bioinformatics.

[39]  Brendan J. Frey,et al.  Denoising and Untangling Graphs Using Degree Priors , 2003, NIPS.

[40]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[41]  F. Holstege,et al.  A high resolution protein interaction map of the yeast Mediator complex. , 2004, Nucleic acids research.

[42]  Ting Chen,et al.  An Integrated Probabilistic Model for Functional Prediction of Proteins , 2004, J. Comput. Biol..

[43]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[44]  William T. Freeman,et al.  Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[45]  Michael Krauthammer,et al.  Probabilistic inference of molecular networks from noisy data sources , 2004, Bioinform..

[46]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[47]  Andrea Pagnani,et al.  Predicting protein functions with message passing algorithms , 2004, Bioinform..

[48]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.