Determining the minimum number of protein-protein interactions required to support known protein complexes

The prediction of protein complexes from protein-protein interactions (PPIs) is a well-studied problem in bioinformatics. However, the currently available PPI data is not enough to describe all known protein complexes. In this paper, we express the problem of determining the minimum number of (additional) required protein-protein interactions as a graph theoretic problem under the constraint that each complex constitutes a connected component in a PPI network. For this problem, we develop two computational methods: one is based on integer linear programming (ILPMinPPI) and the other one is based on an existing greedy-type approximation algorithm (GreedyMinPPI) originally developed in the context of communication and social networks. Since the former method is only applicable to datasets of small size, we apply the latter method to a combination of the CYC2008 protein complex dataset and each of eight PPI datasets (STRING, MINT, BioGRID, IntAct, DIP, BIND, WI-PHI, iRefIndex). The results show that the minimum number of additional required PPIs ranges from 51 (STRING) to 964 (BIND), and that even the four best PPI databases, STRING (51), BioGRID (67), WI-PHI (93) and iRefIndex (85), do not include enough PPIs to form all CYC2008 protein complexes. We also demonstrate that the proposed problem framework and our solutions can enhance the prediction accuracy of existing PPI prediction methods. ILPMinPPI can be freely downloaded from http://sunflower.kuicr.kyoto-u.ac.jp/~nakajima/.

[1]  Ambuj K. Singh,et al.  RRW: repeated random walks on genome-scale protein networks for local cluster discovery , 2009, BMC Bioinformatics.

[2]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[3]  Ian M. Donaldson,et al.  iRefIndex: A consolidated protein interaction database with provenance , 2008, BMC Bioinformatics.

[4]  Mark Culp,et al.  Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS , 2013, BMC Genomics.

[5]  Kenji Mizuguchi,et al.  Homology-based prediction of interactions between proteins using Averaged One-Dependence Estimators , 2014, BMC Bioinformatics.

[6]  S. Pu,et al.  Up-to-date catalogues of yeast protein complexes , 2008, Nucleic acids research.

[7]  Bin Liu,et al.  SPPS: A Sequence-Based Method for Predicting Probability of Protein-Protein Interaction Partners , 2012, PloS one.

[8]  Bonnie Berger,et al.  Struct2Net: a web service to predict protein–protein interactions using a structure-based approach , 2010, Nucleic Acids Res..

[9]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[10]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[11]  Bonnie Berger,et al.  iWRAP: An interface threading approach with application to prediction of cancer-related protein-protein interactions. , 2010, Journal of molecular biology.

[12]  Michal Stern,et al.  The complete optimal stars-clustering-tree problem , 2008, Discret. Appl. Math..

[13]  P. Bork,et al.  Structure-Based Assembly of Protein Complexes in Yeast , 2004, Science.

[14]  Osamu Maruyama,et al.  NWE: Node-weighted expansion for protein complex prediction using random walk distances , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[15]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[16]  Hyeong Jun An,et al.  Estimating the size of the human interactome , 2008, Proceedings of the National Academy of Sciences.

[17]  B. Séraphin,et al.  A generic protein purification method for protein complex characterization and proteome exploration , 1999, Nature Biotechnology.

[18]  F. Jin,et al.  Overexpression of SMARCA5 correlates with cell proliferation and migration in breast cancer , 2015, Tumor Biology.

[19]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases , 2007, PLoS Comput. Biol..

[20]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[21]  Gabriele Ausiello,et al.  MINT: the Molecular INTeraction database , 2006, Nucleic Acids Res..

[22]  Martin Vingron,et al.  IntAct: an open source molecular interaction database , 2004, Nucleic Acids Res..

[23]  James Aspnes,et al.  Network construction with subgraph connectivity constraints , 2015, J. Comb. Optim..

[24]  Sean R. Collins,et al.  Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae*S , 2007, Molecular & Cellular Proteomics.

[25]  Christian von Mering,et al.  STRING: a database of predicted functional associations between proteins , 2003, Nucleic Acids Res..

[26]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[27]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[28]  Albert Chan,et al.  PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs , 2006, BMC Bioinformatics.

[29]  Guimei Liu,et al.  Complex discovery from weighted PPI networks , 2009, Bioinform..

[30]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[31]  Dima Kozakov,et al.  Protein–protein docking by fast generalized Fourier transforms on 5D rotational manifolds , 2016, Proceedings of the National Academy of Sciences.

[32]  Gianni Cesareni,et al.  WI‐PHI: A weighted yeast interactome enriched for direct physical interactions , 2007, Proteomics.

[33]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[34]  Tatsuya Akutsu,et al.  Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels , 2014, BMC Bioinformatics.

[35]  Henning Hermjakob,et al.  InteroPORC: automated inference of highly conserved protein interaction networks , 2008, Bioinform..

[36]  Yoav Tock,et al.  Constructing scalable overlays for pub-sub with many topics , 2007, PODC '07.

[37]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[38]  Nicolas Thierry-Mieg,et al.  New insights into protein-protein interaction data lead to increased estimates of the S. cerevisiae interactome size , 2010, BMC Bioinformatics.

[39]  Xue-wen Chen,et al.  KUPS: constructing datasets of interacting and non-interacting protein pairs with associated attributions , 2010, Nucleic Acids Res..

[40]  Erik L. L. Sonnhammer,et al.  FunCoup 3.0: database of genome-wide functional coupling networks , 2013, Nucleic Acids Res..

[41]  Ian M. Donaldson,et al.  BIND: the Biomolecular Interaction Network Database , 2001, Nucleic Acids Res..

[42]  Sean R. Collins,et al.  Global landscape of protein complexes in the yeast Saccharomyces cerevisiae , 2006, Nature.

[43]  P. Bork,et al.  Proteome survey reveals modularity of the yeast cell machinery , 2006, Nature.

[44]  M. Stern,et al.  The clustering matroid and the optimal clustering tree , 2003, Math. Program..

[45]  S. Fields,et al.  A novel genetic system to detect protein–protein interactions , 1989, Nature.

[46]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..