Patch-DCA: Improved Protein Interface Prediction by utilizing Structural Information and Clustering DCA scores

Over the past decade there have been impressive advances in determining the 3D structures of protein complexes. However, there are still many complexes with unknown structures, even when the structures of the individual proteins are known. The advent of protein sequence information provides an opportunity to leverage evolutionary information to enhance the accuracy of protein-protein interface prediction. To this end, several statistical and machine learning methods have been proposed. In particular, direct coupling analysis has recently emerged as a promising approach for identification of protein contact maps from sequential information. However, the ability of these methods to detect protein-protein inter-residue contacts remains relatively limited. In this work, we propose a method to integrate sequential and co-evolution information with structural and functional information to increase the performance of protein-protein interface prediction. Further, we present a post-processing clustering method that improves the average relative F1 score by 70 % and 24 % and the precision by 80 % and 36 % in comparison with two state-of-the-art methods PSICOV and GREMLIN.

[1]  Dima Kozakov,et al.  The ClusPro web server for protein–protein docking , 2017, Nature Protocols.

[2]  José María Carazo,et al.  BIPSPI: a method for the prediction of partner-specific protein–protein interfaces , 2018, Bioinform..

[3]  Jean-Christophe Nebel,et al.  Progress and challenges in predicting protein interfaces , 2015, Briefings Bioinform..

[4]  Amir Vajdi,et al.  A new DP algorithm for comparing gene expression data using geometric similarity , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[5]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[6]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  Mario Stanke,et al.  CRF-based models of protein surfaces improve protein-protein interaction site predictions , 2014, BMC Bioinformatics.

[8]  Thomas C. Northey,et al.  IntPred: a structure-based predictor of protein–protein interaction sites , 2017, Bioinform..

[9]  Xiaolong Wang,et al.  Prediction of protein binding sites in protein structures using hidden Markov support vector machine , 2009, BMC Bioinformatics.

[10]  A. Hoojghan,et al.  Application of Graphical Models in Protein-Protein Interactions and Dynamics , 2018 .

[11]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[12]  Sama Goliaei,et al.  Identifying Cancer Subnetwork Markers Using Game Theory Method , 2015, International Conference on Biomedical and Health Informatics.

[13]  Zhiyong Wang,et al.  Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning , 2013, Bioinform..

[14]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[15]  Stephen R. Comeau,et al.  PIPER: An FFT‐based protein docking program with pairwise potentials , 2006, Proteins.

[16]  Zahra Razaghi-Moghadam,et al.  Systems genetics of nonsyndromic orofacial clefting provides insights into its complex aetiology , 2018, European Journal of Human Genetics.

[17]  R. Nussinov,et al.  Principles of protein-protein interactions: what are the preferred ways for proteins to interact? , 2008, Chemical reviews.

[18]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[19]  Z. Weng,et al.  Structure, function, and evolution of transient and obligate protein-protein interactions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  F. Morcos,et al.  Genomics-aided structure prediction , 2012, Proceedings of the National Academy of Sciences.

[21]  A. Lesk,et al.  Structural mechanisms for domain movements in proteins. , 1994, Biochemistry.

[22]  R. Raz,et al.  ProMate: a structure based prediction program to identify the location of protein-protein binding sites. , 2004, Journal of molecular biology.

[23]  S. Ovchinnikov Protein structure determination using evolutionary information , 2017 .

[24]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[25]  G. Gloor,et al.  Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. , 2005, Biochemistry.

[26]  Amir Vajdi,et al.  Clustering Protein Conformations Using a Dynamic Programming Based Similarity Measurement , 2016 .

[27]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[28]  R. Jernigan,et al.  The energy profiles of atomic conformational transition intermediates of adenylate kinase , 2009, Proteins.

[29]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[30]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[31]  Marcin J. Skwark,et al.  PconsC: combination of direct information methods and alignments improves contact prediction , 2013, Bioinform..

[32]  Zhen Li,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016, bioRxiv.

[33]  Pierre Tufféry,et al.  InterEvDock: a docking server to predict the structure of protein–protein interactions using evolutionary information , 2016, Nucleic Acids Res..

[34]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[35]  A. Bonvin,et al.  WHISCY: What information does surface conservation yield? Application to data‐driven docking , 2006, Proteins.

[36]  Dima Kozakov,et al.  Relationship between Hot Spot Residues and Ligand Binding Hot Spots in Protein-Protein Interfaces , 2012, J. Chem. Inf. Model..