Utilizing knowledge base of amino acids structural neighborhoods to predict protein-protein interaction sites

BackgroundProtein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect.ResultsWe have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE’s behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient.ConclusionIn this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.

[1]  Bogdan Istrate,et al.  Algorithmic approaches to protein-protein interaction site prediction , 2015, Algorithms for Molecular Biology.

[2]  Woody Sherman,et al.  Analysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods , 2010, J. Cheminformatics.

[3]  Mario Stanke,et al.  Combining features in a graphical model to predict protein binding sites , 2015, Proteins.

[4]  Bogdan Istrate,et al.  Transient protein-protein interface prediction: datasets, features, algorithms, and the RAD-T predictor , 2014, BMC Bioinformatics.

[5]  Dariusz Plewczynski,et al.  Performance of machine learning methods for ligand-based virtual screening. , 2009, Combinatorial chemistry & high throughput screening.

[6]  Jan Jelínek,et al.  Utilizing knowledge base of amino acids structural neighborhoods to predict protein-protein interaction sites , 2016, ICCABS.

[7]  H. Wolfson,et al.  A new, structurally nonredundant, diverse data set of protein–protein interfaces and its implications , 2004, Protein science : a publication of the Protein Society.

[8]  Raquel Norel,et al.  Protein interface conservation across structure space , 2010, Proceedings of the National Academy of Sciences.

[9]  Huan-Xiang Zhou,et al.  Prediction of interface residues in protein–protein complexes by a consensus neural network method: Test against NMR data , 2005, Proteins.

[10]  Olivier Lichtarge,et al.  BIOINFORMATICS ORIGINAL PAPER Systems biology , 2004 .

[11]  Sereina Riniker,et al.  Open-source platform to benchmark fingerprints for ligand-based virtual screening , 2013, Journal of Cheminformatics.

[12]  Jan Jelínek,et al.  Using Neo4j for Mining Protein Graphs: A Case Study , 2015, 2015 26th International Workshop on Database and Expert Systems Applications (DEXA).

[13]  Vasant Honavar,et al.  Predicting protein-protein interface residues using local surface structural similarity , 2012, BMC Bioinformatics.

[14]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[15]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[16]  Jean-Christophe Nebel,et al.  Progress and challenges in predicting protein interfaces , 2015, Briefings Bioinform..

[17]  Huan-Xiang Zhou,et al.  meta-PPISP: a meta web server for protein-protein interaction site prediction , 2007, Bioinform..

[18]  Aleksey A. Porollo,et al.  Prediction‐based fingerprints of protein–protein interactions , 2006, Proteins.

[19]  Evan Bolton,et al.  PubChem3D: conformer ensemble accuracy , 2013, Journal of Cheminformatics.

[20]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[21]  Ondrej Kuzelka,et al.  Prediction of DNA-binding propensity of proteins by the ball-histogram method using automatic template search , 2011, BMC Bioinformatics.

[22]  Jihong Guan,et al.  PredUs: a web server for predicting protein interfaces using structural neighbors , 2011, Nucleic Acids Res..

[23]  Philip S. Yu,et al.  G-Bean: an ontology-graph based web tool for biomedical literature retrieval , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[24]  Alexander Bockmayr,et al.  Double and multiple knockout simulations for genome-scale metabolic network reconstructions , 2015, Algorithms for Molecular Biology.

[25]  Mario Stanke,et al.  CRF-based models of protein surfaces improve protein-protein interaction site predictions , 2014, BMC Bioinformatics.

[26]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[27]  Rainer Merkl,et al.  Prescont: Predicting protein‐protein interfaces utilizing four residue properties , 2012, Proteins.

[28]  Jens Krüger,et al.  Development of a pharmacorphore model for pharmacological chaperones targeting mutant trafficking-deficient CNG channels , 2013, Journal of Cheminformatics.

[29]  Alfonso Valencia,et al.  Progress and challenges in predicting protein-protein interaction sites , 2008, Briefings Bioinform..