Belief Propagation Estimation of Protein and Domain Interactions Using the Sum–Product Algorithm

In this paper, a novel framework is presented to estimate protein-protein interactions (PPIs) and domain-domain interactions (DDIs) based on a belief propagation estimation method that efficiently computes interaction probabilities. Experimental interactions, domain architecture, and gene ontology (GO) annotations are used to create a factor graph representation of the joint probability distribution of pairwise protein and domain interactions. Bound structures are used as a priori evidence of domain interactions. These structures come from experiments documented in iPfam. The probability distribution contained in the factor graph is then efficiently marginalized with a message passing algorithm called the sum-product algorithm (SPA). This method is compared against two other approaches: maximum-likelihood estimation (MLE) and maximum specificity set cover (MSSC). SPA performs better for simulated scenarios and for inferring high-quality PPI data of Saccharomyces cerevisiae. This framework can be used to predict potential protein and domain interactions at a genome wide scale and for any organism with identified protein-domain architectures.

[1]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[2]  Ananth Grama,et al.  Functional coherence in domain interaction networks , 2008, ECCB.

[3]  Bane V. Vasic,et al.  Iterative decoding of linear block codes: a parity-check orthogonalization approach , 2005, IEEE Transactions on Information Theory.

[4]  D. Koller,et al.  InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale , 2007, Genome Biology.

[5]  J. Herrero,et al.  Kinase peptide specificity: improved determination and relevance to protein phosphorylation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  V. Uversky Intrinsically Disordered Proteins , 2000 .

[7]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[8]  Chengbang Huang,et al.  Predicting Protein-Protein Interactions from Protein Domains Using a Set Cover Approach , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Mark Gerstein,et al.  Bridging structural biology and genomics: assessing protein interaction data with known complexes. , 2002, Drug discovery today.

[10]  T. Sittler,et al.  The Plasmodium protein network diverges from those of other eukaryotes , 2005, Nature.

[11]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[12]  Jie Liang,et al.  Protein-protein interactions: hot spots and structurally conserved residues often locate in complemented pockets that pre-organized in the unbound states: implications for docking. , 2004, Journal of molecular biology.

[13]  Z. Weng,et al.  ZDOCK: An initial‐stage protein‐docking algorithm , 2003, Proteins.

[14]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[15]  Zohar Itzhaki,et al.  Evolutionary conservation of domain-domain interactions , 2006, Genome Biology.

[16]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[17]  A. Elofsson,et al.  Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. , 2005, Journal of molecular biology.

[18]  T. Pawson,et al.  Assembly of Cell Regulatory Systems Through Protein Interaction Domains , 2003, Science.

[19]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[20]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[21]  Jamie S. Evans,et al.  Distributed Downlink Beamforming With Cooperative Base Stations , 2008, IEEE Transactions on Information Theory.

[22]  James R. Knight,et al.  A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae , 2000, Nature.

[23]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[24]  Olga G. Troyanskaya,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm332 Data and text mining , 2022 .

[25]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[26]  Mung Chiang,et al.  Distributed network control through sum product algorithm on graphs , 2002, Global Telecommunications Conference, 2002. GLOBECOM '02. IEEE.

[27]  Qing Zhang,et al.  The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema , 2004, Nucleic Acids Res..

[28]  Jesús A. Izaguirre,et al.  Cytoprophet: a Cytoscape plug-in for protein and domain interaction networks inference , 2008, Bioinform..

[29]  Sourav Bandyopadhyay,et al.  Systematic identification of functional orthologs based on protein network comparison. , 2006, Genome research.

[30]  Robert D. Finn,et al.  iPfam: visualization of protein?Cprotein interactions in PDB at domain and amino acid resolutions , 2005, Bioinform..

[31]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Xiao Ma,et al.  Binary intersymbol interference channels: Gallager codes, density evolution, and code performance bounds , 2003, IEEE Transactions on Information Theory.

[33]  Casimir A. Kulikowski,et al.  Protein-Protein Interaction Prediction Based on Sequence Data by Support Vector Machine with Probability Assignment , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[34]  Matthew A. Hibbs,et al.  Discovery of biological networks from diverse functional genomic data , 2005, Genome Biology.

[35]  M. Alber,et al.  Periodic reversal of direction allows Myxobacteria to swarm , 2009, Proceedings of the National Academy of Sciences.

[36]  E. van Nimwegen,et al.  Accurate Prediction of Protein–protein Interactions from Sequence Alignments Using a Bayesian Method , 2022 .

[37]  Ajay Dholakia,et al.  Efficient implementations of the sum-product algorithm for decoding LDPC codes , 2001, GLOBECOM'01. IEEE Global Telecommunications Conference (Cat. No.01CH37270).

[38]  Y. Zhang,et al.  IntAct—open source resource for molecular interaction data , 2006, Nucleic Acids Res..