A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison

Reliable identification of near‐native poses of docked protein–protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein–protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding sites is not unusual. Often, ranking protocols are based on initial clustering of docked poses followed by the application of an energy function to rank each cluster according to its lowest energy member. Here, we present an approach of cluster ranking based not only on one molecular descriptor (e.g., an energy function) but also employing a large number of descriptors that are integrated in a machine learning model, whereby, an extremely randomized tree classifier based on 109 molecular descriptors is trained. The protocol is based on first locally enriching clusters with additional poses, the clusters are then characterized using features describing the distribution of molecular descriptors within the cluster, which are combined into a pairwise cluster comparison model to discriminate near‐native from incorrect clusters. The results show that our approach is able to identify clusters containing near‐native protein–protein complexes. In addition, we present an analysis of the descriptors with respect to their power to discriminate near native from incorrect clusters and how data transformations and recursive feature elimination can improve the ranking performance. Proteins 2017; 85:528–543. © 2016 Wiley Periodicals, Inc.

[1]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[2]  X. Daura,et al.  Peptide Folding: When Simulation Meets Experiment , 1999 .

[3]  Hilla Peretz,et al.  The , 1966 .

[4]  R. Abagyan,et al.  Optimal docking area: A new method for predicting protein–protein interaction sites , 2004, Proteins.

[5]  Kellie J. Archer,et al.  Empirical characterization of random forest variable importance measures , 2008, Comput. Stat. Data Anal..

[6]  Christophe Chipot,et al.  Efficient determination of protein-protein standard binding free energies from first principles. , 2013, Journal of chemical theory and computation.

[7]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[8]  Vasant G Honavar,et al.  Computational prediction of protein interfaces: A review of data driven methods , 2015, FEBS letters.

[9]  Roland L. Dunbrack,et al.  proteins STRUCTURE O FUNCTION O BIOINFORMATICS Improved prediction of protein side-chain conformations with SCWRL4 , 2022 .

[10]  Guang Song,et al.  Generalized spring tensor models for protein fluctuation dynamics and conformation changes , 2010, 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop.

[11]  Burkhard Rost,et al.  Alternative Protein-Protein Interfaces Are Frequent Exceptions , 2012, PLoS Comput. Biol..

[12]  Sarah A. Teichmann,et al.  Principles of protein-protein interactions , 2002, ECCB.

[13]  Jianpeng Ma,et al.  CHARMM: The biomolecular simulation program , 2009, J. Comput. Chem..

[14]  Joël Janin,et al.  Protein-protein docking tested in blind predictions: the CAPRI experiment. , 2010, Molecular bioSystems.

[15]  Marc F Lensink,et al.  Docking, scoring, and affinity prediction in CAPRI , 2013, Proteins.

[16]  Alexandre M. J. J. Bonvin,et al.  PRODIGY: a web server for predicting the binding affinity of protein-protein complexes , 2016, Bioinform..

[17]  L. Mcquitty Elementary Linkage Analysis for Isolating Orthogonal and Oblique Types and Typal Relevancies , 1957 .

[18]  B. Alberts The Cell as a Collection of Protein Machines: Preparing the Next Generation of Molecular Biologists , 1998, Cell.

[19]  J. Thornton,et al.  Structural characterisation and functional significance of transient protein-protein interactions. , 2003, Journal of molecular biology.

[20]  Zhiping Weng,et al.  ZDOCK server: interactive docking prediction of protein-protein complexes and symmetric multimers , 2014, Bioinform..

[21]  Stephen R. Comeau,et al.  PIPER: An FFT‐based protein docking program with pairwise potentials , 2006, Proteins.

[22]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[23]  Z. Weng,et al.  ZDOCK: An initial‐stage protein‐docking algorithm , 2003, Proteins.

[24]  Carles Pons,et al.  Scoring by Intermolecular Pairwise Propensities of Exposed Residues (SIPPER): A New Efficient Potential for Protein-Protein Docking , 2011, J. Chem. Inf. Model..

[25]  Iain H. Moal,et al.  Protein-protein binding affinity prediction on a diverse set of structures , 2011, Bioinform..

[26]  Luigi Cavallo,et al.  Ranking multiple docking solutions based on the conservation of inter‐residue contacts , 2013, Proteins.

[27]  Yang Zhang,et al.  A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction , 2010, PloS one.

[28]  P. Bates,et al.  SwarmDock and the Use of Normal Modes in Protein-Protein Docking , 2010, International journal of molecular sciences.

[29]  Jeffrey J. Gray,et al.  Pushing the Backbone in Protein-Protein Docking. , 2016, Structure.

[30]  P. Aloy,et al.  Interactome3D: adding structural details to protein networks , 2013, Nature Methods.

[31]  Chaok Seok,et al.  High-resolution protein-protein docking by global optimization: recent advances and future challenges. , 2015, Current opinion in structural biology.

[32]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[33]  Juan Fernández-Recio,et al.  Inferring the microscopic surface energy of protein–protein interfaces from mutation data , 2015, Proteins.

[34]  D. Case,et al.  Insights into protein-protein binding by binding free energy calculation and free energy decomposition for the Ras-Raf and Ras-RalGDS complexes. , 2003, Journal of molecular biology.

[35]  Ivet Bahar,et al.  Optimal design of protein docking potentials: Efficiency and limitations , 2005, Proteins.

[36]  Elisenda Feliu,et al.  On the analysis of protein–protein interactions via knowledge‐based potentials for the prediction of protein–protein docking , 2011, Protein science : a publication of the Protein Society.

[37]  Huan-Xiang Zhou,et al.  Using the concept of transient complex for affinity predictions in CAPRI rounds 20–27 and beyond , 2013, Proteins.

[38]  Juan Fernández-Recio,et al.  CCharPPI web server: computational characterization of protein-protein interactions from structure , 2015, Bioinform..

[39]  Ruth Nussinov,et al.  FireDock: Fast interaction refinement in molecular docking , 2007, Proteins.

[40]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[41]  Dror Tobi,et al.  Designing coarse grained-and atom based-potentials for protein-protein docking , 2010, BMC Structural Biology.

[42]  S. Wodak,et al.  Assessment of blind predictions of protein–protein interactions: Current status of docking methods , 2003, Proteins.

[43]  Rainer Merkl,et al.  PROCOS: Computational analysis of protein–protein complexes , 2011, J. Comput. Chem..

[44]  S. Wodak,et al.  Assessment of CAPRI predictions in rounds 3–5 shows progress in docking procedures , 2005, Proteins.

[45]  Peter M. Kasson,et al.  GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit , 2013, Bioinform..

[46]  Mieczyslaw Torchala,et al.  SwarmDock: a server for flexible protein-protein docking , 2013, Bioinform..

[47]  Mieczyslaw Torchala,et al.  Characterizing Changes in the Rate of Protein-Protein Dissociation upon Interface Mutation Using Hotspot Energy and Organization , 2013, PLoS Comput. Biol..

[48]  Dima Kozakov,et al.  Sampling and scoring: A marriage made in heaven , 2013, Proteins.

[49]  Marcin Król,et al.  Flexible relaxation of rigid‐body docking solutions , 2007, Proteins.

[50]  Shoshana J. Wodak,et al.  Score_set: a CAPRI benchmark for Scoring protein complexes , 2014 .

[51]  Hongyi Zhou,et al.  A physical reference state unifies the structure‐derived potential of mean force for protein folding and binding , 2004, Proteins.

[52]  Ron Elber,et al.  PIE—Efficient filters and coarse grained potentials for unbound protein–protein docking , 2010, Proteins.

[53]  Ilya A. Vakser,et al.  DECK: Distance and environment-dependent, coarse-grained, knowledge-based potentials for protein-protein docking , 2011, BMC Bioinformatics.

[54]  Vincent Roman Wolowski,et al.  Computational analysis of protein-protein complexes related to knowledge-based predictions of interaction , 2008 .

[55]  Iain H. Moal,et al.  Kinetic Rate Constant Prediction Supports the Conformational Selection Mechanism of Protein Binding , 2012, PLoS Comput. Biol..

[56]  Anna Vangone,et al.  Contacts-based prediction of binding affinity in protein–protein complexes , 2015, eLife.

[57]  D. V. S. Ravikant,et al.  Improving ranking of models for protein complexes with side chain modeling and atomic potentials , 2013, Proteins.

[58]  Nicolas Leulliot,et al.  Structure of the yeast tRNA m7G methylation complex. , 2008, Structure.