idDock+: Integrating Machine Learning in Probabilistic Search for Protein-Protein Docking

Predicting the three-dimensional native structures of protein dimers, a problem known as protein-protein docking, is key to understanding molecular interactions. Docking is a computationally challenging problem due to the diversity of interactions and the high dimensionality of the configuration space. Existing methods draw configurations systematically or at random from the configuration space. The inaccuracy of scoring functions used to evaluate drawn configurations presents additional challenges. Evidence is growing that optimization of a scoring function is an effective technique only once the drawn configuration is sufficiently similar to the native structure. Therefore, in this article we present a method that employs optimization of a sophisticated energy function, FoldX, only to locally improve a promising configuration. The main question of how promising configurations are identified is addressed through a machine learning method trained a priori on an extensive dataset of functionally diverse protein dimers. To deal with the vast configuration space, a probabilistic search algorithm operates on top of the learner, feeding to it configurations drawn at random. We refer to our method as idDock+, for informatics-driven Docking. idDock+is tested on 15 dimers of different sizes and functional classes. Analysis shows that on all systems idDock+finds a near-native structure and is comparable in accuracy to other state-of-the-art methods. idDock+ represents one of the first highly efficient hybrid methods that combines fast machine learning models with demanding optimization of sophisticated energy scoring functions. Our results indicate that this is a promising direction to improve both efficiency and accuracy in docking.

[1]  Krishna Praneeth Kilambi,et al.  Protein-Protein Docking with Dynamic Residue Protonation States , 2014, PLoS Comput. Biol..

[2]  Tammy M. K. Cheng,et al.  pyDock: Electrostatics and desolvation for effective scoring of rigid‐body protein–protein docking , 2007, Proteins.

[3]  Benjamin A. Shoemaker,et al.  Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners , 2007, PLoS Comput. Biol..

[4]  David W Ritchie,et al.  Recent progress and future directions in protein-protein docking. , 2008, Current protein & peptide science.

[5]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[6]  Marc F Lensink,et al.  Blind predictions of protein interfaces by docking calculations in CAPRI , 2010, Proteins.

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[9]  C. Dominguez,et al.  HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. , 2003, Journal of the American Chemical Society.

[10]  M. L. Connolly Analytical molecular surface calculation , 1983 .

[11]  R. Nussinov,et al.  A geometry-based suite of molecular docking processes. , 1995, Journal of molecular biology.

[12]  Amarda Shehu,et al.  HopDock: a probabilistic search algorithm for decoy sampling in protein-protein docking , 2013, Proteome Science.

[13]  Vasant Honavar,et al.  A two-stage classifier for identification of protein-protein interface residues , 2004, ISMB/ECCB.

[14]  Amarda Shehu,et al.  Basin Hopping as a General and Versatile Optimization Framework for the Characterization of Biological Macromolecules , 2012, Adv. Artif. Intell..

[15]  Maricel G. Kann,et al.  IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2013 .

[16]  Hongbo Zhu,et al.  NOXclass: prediction of protein-protein interaction types , 2006, BMC Bioinformatics.

[17]  Amarda Shehu,et al.  A basin hopping algorithm for protein-protein docking , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[18]  Christopher R. Corbeil,et al.  Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go , 2008, British journal of pharmacology.

[19]  Marc F Lensink,et al.  Docking and scoring protein interactions: CAPRI 2009 , 2010, Proteins.

[20]  D Fischer,et al.  Molecular surface representations by sparse critical points , 1994, Proteins.

[21]  Kengo Kinoshita,et al.  Docking of protein molecular surfaces with evolutionary trace analysis , 2007, Proteins.

[22]  Zhiping Weng,et al.  ZDOCK server: interactive docking prediction of protein-protein complexes and symmetric multimers , 2014, Bioinform..

[23]  Fan Jiang,et al.  Prediction of protein-protein binding site by using core interface residue and support vector machine , 2008, BMC Bioinformatics.

[24]  J. Doye,et al.  Global Optimization by Basin-Hopping and the Lowest Energy Structures of Lennard-Jones Clusters Containing up to 110 Atoms , 1997, cond-mat/9803344.

[25]  A. Gorin,et al.  Protein docking using surface matching and supervised machine learning , 2007, Proteins.

[26]  Lazaros Mavridis,et al.  HexServer: an FFT-based protein docking server powered by graphics processors , 2010, Nucleic Acids Res..

[27]  Amarda Shehu,et al.  Protein docking with information on evolutionary conserved interfaces , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[28]  Stephen R. Comeau,et al.  PIPER: An FFT‐based protein docking program with pairwise potentials , 2006, Proteins.

[29]  Sandor Vajda,et al.  ClusPro: a fully automated algorithm for protein-protein docking , 2004, Nucleic Acids Res..

[30]  Andrey Tovchigrechko,et al.  GRAMM-X public web server for protein–protein docking , 2006, Nucleic Acids Res..

[31]  Sergey Lyskov,et al.  The RosettaDock server for local protein–protein docking , 2008, Nucleic Acids Res..

[32]  François Stricher,et al.  The FoldX web server: an online force field , 2005, Nucleic Acids Res..

[33]  C. Bajaj,et al.  F2Dock: fast Fourier protein-protein docking. , 2011, IEEE/ACM transactions on computational biology and bioinformatics.

[34]  Ruth Nussinov,et al.  Principles of docking: An overview of search algorithms and a guide to scoring functions , 2002, Proteins.

[35]  Carles Pons,et al.  pyDockWEB: a web server for rigid-body protein-protein docking using electrostatics and desolvation scoring , 2013, Bioinform..

[36]  Alessandra Carbone,et al.  Joint Evolutionary Trees: A Large-Scale Method To Predict Protein Interfaces Based on Sequence Sampling , 2009, PLoS Comput. Biol..

[37]  Sheng-You Huang,et al.  Search strategies and evaluation in protein-protein docking: principles, advances and challenges. , 2014, Drug discovery today.

[38]  Amarda Shehu,et al.  Informatics-driven Protein-protein Docking , 2013, BCB.

[39]  Ruth Nussinov,et al.  Combinatorial docking approach for structure prediction of large proteins and multi-molecular assemblies , 2005, Physical biology.

[40]  Genki Terashi,et al.  The SKE‐DOCK server and human teams based on a combined method of shape complementarity and free energy estimation , 2007, Proteins.

[41]  Bin Li,et al.  Protein docking prediction using predicted protein-protein interface , 2012, BMC Bioinformatics.

[42]  Ruth Nussinov,et al.  PatchDock and SymmDock: servers for rigid and symmetric docking , 2005, Nucleic Acids Res..

[43]  Renxiao Wang,et al.  The PDBbind database: methodologies and updates. , 2005, Journal of medicinal chemistry.

[44]  Amarda Shehu,et al.  Guiding protein docking with Geometric and Evolutionary Information , 2012, J. Bioinform. Comput. Biol..

[45]  Z. Weng,et al.  ZDOCK: An initial‐stage protein‐docking algorithm , 2003, Proteins.