BIPSPI: a method for the prediction of partner-specific protein–protein interfaces

Motivation Protein‐Protein Interactions (PPI) are essentials for most cellular processes and thus, unveiling how proteins interact is a crucial question that can be better understood by identifying which residues are responsible for the interaction. Computational approaches are orders of magnitude cheaper and faster than experimental ones, leading to proliferation of multiple methods aimed to predict which residues belong to the interface of an interaction. Results We present BIPSPI, a new machine learning‐based method for the prediction of partner‐specific PPI sites. Contrary to most binding site prediction methods, the proposed approach takes into account a pair of interacting proteins rather than a single one in order to predict partner‐specific binding sites. BIPSPI has been trained employing sequence‐based and structural features from both protein partners of each complex compiled in the Protein‐Protein Docking Benchmark version 5.0 and in an additional set independently compiled. Also, a version trained only on sequences has been developed. The performance of our approach has been assessed by a leave‐one‐out cross‐validation over different benchmarks, outperforming state‐of‐the‐art methods. Availability and implementation BIPSPI web server is freely available at http://bipspi.cnb.csic.es. BIPSPI code is available at https://github.com/bioinsilico/BIPSPI. Docker image is available at https://hub.docker.com/r/bioinsilico/bipspi/. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Haiyuan Yu,et al.  Interactome INSIDER: a structural interactome browser for genomic studies , 2017, Nature Methods.

[2]  Zhiping Weng,et al.  Protein–protein docking benchmark version 4.0 , 2010, Proteins.

[3]  Carlos Oscar S Sorzano,et al.  3DIANA: 3D Domain Interaction Analysis: A Toolbox for Quaternary Structure Modeling , 2016, Biophysical journal.

[4]  Vasant Honavar,et al.  HomPPI: a class of sequence homology based protein-protein interface prediction methods , 2011, BMC Bioinformatics.

[5]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[6]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[7]  Pamela F. Jones,et al.  Improving the prediction of protein binding sites by combining heterogeneous data and Voronoi diagrams , 2011, BMC Bioinformatics.

[8]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[11]  M. Šikić,et al.  PSAIA – Protein Structure and Interaction Analyzer , 2008, BMC Structural Biology.

[12]  Z. Weng,et al.  Protein–protein docking benchmark version 3.0 , 2008, Proteins.

[13]  Jimin Pei,et al.  AL2CO: calculation of positional conservation in a protein sequence alignment , 2001, Bioinform..

[14]  Frank Sobott,et al.  Protein complexes gain momentum. , 2002, Current opinion in structural biology.

[15]  Kristian Vlahovicek,et al.  Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests , 2009, PLoS Comput. Biol..

[16]  J M Carazo,et al.  3DBIONOTES: A unified, enriched and interactive view of macromolecular information. , 2016, Journal of structural biology.

[17]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[18]  T. Hamelryck An amino acid has two sides: A new 2D measure provides a different view of solvent exposure , 2005, Proteins.

[19]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[20]  Vasant G Honavar,et al.  Computational prediction of protein interfaces: A review of data driven methods , 2015, FEBS letters.

[21]  Pamela F. Jones,et al.  VORFFIP-Driven Dock: V-D2OCK, a Fast and Accurate Protein Docking Strategy , 2015, PloS one.

[22]  Sandor Vajda,et al.  CAPRI: A Critical Assessment of PRedicted Interactions , 2003, Proteins.

[23]  A. Bonvin,et al.  WHISCY: What information does surface conservation yield? Application to data‐driven docking , 2006, Proteins.

[24]  Kenji Mizuguchi,et al.  Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites , 2010, Bioinform..

[25]  Alexandre M J J Bonvin,et al.  Information-driven structural modelling of protein-protein interactions. , 2015, Methods in molecular biology.

[26]  A. Ben-Hur,et al.  PAIRpred: Partner‐specific prediction of interacting residues from sequence and structure , 2014, Proteins.

[27]  K. Mizuguchi,et al.  Partner-Aware Prediction of Interacting Residues in Protein-Protein Complexes from Sequence Data , 2011, PloS one.

[28]  Y Wang,et al.  Mapping, modeling, and characterization of protein-protein interactions on a proteomic scale. , 2017, Current opinion in structural biology.

[29]  D. Koller,et al.  InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale , 2007, Genome Biology.

[30]  Joel P Mackay,et al.  The structural analysis of protein–protein interactions by NMR spectroscopy , 2009, Proteomics.

[31]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[32]  José María Carazo,et al.  3DBIONOTES v2.0: a web server for the automatic annotation of macromolecular structures , 2017, Bioinform..

[33]  Joan Segura,et al.  A holistic in silico approach to predict functional sites in protein structures , 2012, Bioinform..

[34]  Dan Li,et al.  Recent Advances in Protein-Protein Docking. , 2016, Current drug targets.

[35]  José María Carazo,et al.  Using neighborhood cohesiveness to infer interactions between protein domains , 2015, Bioinform..

[36]  R. Raz,et al.  ProMate: a structure based prediction program to identify the location of protein-protein binding sites. , 2004, Journal of molecular biology.

[37]  Joan Segura,et al.  3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures , 2017, Molecules.

[38]  P. Aloy,et al.  Interactome3D: adding structural details to protein networks , 2013, Nature Methods.

[39]  Terence Hwa,et al.  Direct coupling analysis for protein contact prediction. , 2014, Methods in molecular biology.

[40]  Jerome H Friedman,et al.  Multiple additive regression trees with application in epidemiology , 2003, Statistics in medicine.

[41]  Yaxia Yuan,et al.  Protein-protein interface analysis and hot spots identification for chemical ligand design. , 2014, Current pharmaceutical design.

[42]  Alex Fout,et al.  Protein Interface Prediction using Graph Convolutional Networks , 2017, NIPS.

[43]  B. Honig,et al.  A hybrid method for protein–protein interface prediction , 2016, Protein science : a publication of the Protein Society.

[44]  Piero Fariselli,et al.  ISPRED4: interaction sites PREDiction in protein structures with a refining grammar model , 2017, Bioinform..

[45]  Lei Ding,et al.  Self-association of human PCSK9 correlates with its LDLR-degrading activity. , 2008, Biochemistry.

[46]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[47]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[48]  Yigong Shi A Glimpse of Structural Biology through X-Ray Crystallography , 2014, Cell.

[49]  Doree Sitkoff,et al.  Pharmacologic Profile of the Adnectin BMS-962476, a Small Protein Biologic Alternative to PCSK9 Antibodies for Low-Density Lipoprotein Lowering , 2014, The Journal of Pharmacology and Experimental Therapeutics.

[50]  Raphael A. G. Chaleil,et al.  Updates to the Integrated Protein-Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2. , 2015, Journal of molecular biology.

[51]  A. Grigoriev On the number of protein-protein interactions in the yeast proteome. , 2003, Nucleic acids research.