Bayesian active learning for optimization and uncertainty quantification in protein docking

Motivation Ab initio protein docking represents a major challenge for optimizing a noisy and costly “black box”-like function in a high-dimensional space. Despite progress in this field, there is no docking method available for rigorous uncertainty quantification (UQ) of its solution quality (e.g. interface RMSD or iRMSD). Results We introduce a novel algorithm, Bayesian Active Learning (BAL), for optimization and UQof such black-box functions and flexible protein docking. BAL directly models the posterior distribution of the global optimum (or native structures for protein docking) with active sampling and posterior estimation iteratively feeding each other. Furthermore, we use complex normal modes to represent a homogeneous Euclidean conformation space suitable for high-dimension optimization and construct funnel-like energy models for encounter complexes. Over a protein docking benchmark set and a CAPRI set including homology docking, we establish that BAL significantly improve against both starting points by rigid docking and refinements by particle swarm optimization, providing for one third targets a top-3 near-native prediction. BAL also generates tight confidence intervals with half range around 25% of iRMSD and confidence level at 85%. Its estimated probability of a prediction being native or not achieves binary classification AUROC at 0.93 and AUPRC over 0.60 (compared to 0.14 by chance); and also found to help ranking predictions. To the best of knowledge, this study represents the first uncertainty quantification solution for protein docking, with theoretical rigor and comprehensive assessment. Availability Source codes are available at https://github.com/Shen-Lab/BAL. Contact yshen@tamu.edu Supplementary information https://github.com/Shen-Lab/BAL/tree/master/Paper_SI/

[1]  Eric Walter,et al.  An informational approach to the global optimization of expensive-to-evaluate functions , 2006, J. Glob. Optim..

[2]  Yang Shen,et al.  Predicting protein conformational changes for unbound and homology docking: learning from intrinsic and induced flexibility , 2017, Proteins.

[3]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[4]  Jeffrey J. Gray,et al.  Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. , 2003, Journal of molecular biology.

[5]  Yang Shen,et al.  Improved flexible refinement of protein docking in CAPRI rounds 22–27 , 2013, Proteins.

[6]  M. Michael Gromiha,et al.  Protein-protein binding affinity prediction from amino acid sequence , 2014, Bioinform..

[7]  Jianpeng Ma,et al.  CHARMM: The biomolecular simulation program , 2009, J. Comput. Chem..

[8]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[9]  M. Sternberg,et al.  Prediction of protein-protein interactions by docking methods. , 2002, Current opinion in structural biology.

[10]  Zhiping Weng,et al.  Protein–protein docking benchmark version 4.0 , 2010, Proteins.

[11]  Chandrajit L. Bajaj,et al.  Statistical Framework for Uncertainty Quantification in Computational Molecular Modeling , 2016, BCB.

[12]  Yang Shen,et al.  cNMA: a framework of encounter complex-based normal mode analysis to model conformational changes in protein interactions , 2015, Bioinform..

[13]  S. Wodak,et al.  Assessment of CAPRI predictions in rounds 3–5 shows progress in docking procedures , 2005, Proteins.

[14]  Benjamin Van Roy,et al.  A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..

[15]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[16]  Lester Ingber,et al.  Adaptive simulated annealing (ASA): Lessons learned , 2000, ArXiv.

[17]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[18]  Ioannis Ch. Paschalidis,et al.  Optimizing noisy funnel-like functions on the euclidean group with applications to protein docking , 2007, 2007 46th IEEE Conference on Decision and Control.

[19]  Ioannis Ch. Paschalidis,et al.  Protein Docking by the Underestimation of Free Energy Funnels in the Space of Encounter Complexes , 2008, PLoS Comput. Biol..

[20]  L. Györfi,et al.  A Distribution-Free Theory of Nonparametric Regression (Springer Series in Statistics) , 2002 .

[21]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[22]  Nick V. Grishin,et al.  Estimation of Uncertainties in the Global Distance Test (GDT_TS) for CASP Models , 2016, PloS one.

[23]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[24]  William Sheffler,et al.  Efficient Flexible Backbone Protein-Protein Docking for Challenging Targets , 2017, bioRxiv.

[25]  Daniel Jiménez-González,et al.  LightDock: a new multi-scale approach to protein–protein docking , 2018, Bioinform..

[26]  Paul A Bates,et al.  Refinement of protein‐protein complexes in contact map space with metadynamics simulations , 2018, Proteins.

[27]  P. Bates,et al.  SwarmDock and the Use of Normal Modes in Protein-Protein Docking , 2010, International journal of molecular sciences.

[28]  P. Aloy,et al.  Interactome3D: adding structural details to protein networks , 2013, Nature Methods.

[29]  Dima Kozakov,et al.  Energy Minimization on Manifolds for Docking Flexible Molecules. , 2015, Journal of chemical theory and computation.

[30]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[31]  J. Chilès,et al.  Geostatistics: Modeling Spatial Uncertainty , 1999 .

[32]  Z. Weng,et al.  Protein–protein docking benchmark version 3.0 , 2008, Proteins.

[33]  Dima Kozakov,et al.  Optimal clustering for detecting near-native conformations in protein docking. , 2005, Biophysical journal.

[34]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[35]  Jordi Grau-Moya,et al.  A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function , 2012, NIPS.

[36]  Alexandre M J J Bonvin,et al.  Are scoring functions in protein-protein docking ready to predict interactomes? Clues from a novel binding affinity benchmark. , 2010, Journal of proteome research.

[37]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[38]  Dima Kozakov,et al.  Convergence and combination of methods in protein-protein docking. , 2009, Current opinion in structural biology.