Incorporating structural similarity into a scoring function to enhance the prediction of binding affinities

In this study, we developed a novel algorithm to improve the screening performance of an arbitrary docking scoring function by recalibrating the docking score of a query compound based on its structure similarity with a set of training compounds, while the extra computational cost is neglectable. Two popular docking methods, Glide and AutoDock Vina were adopted as the original scoring functions to be processed with our new algorithm and similar improvement performance was achieved. Predicted binding affinities were compared against experimental data from ChEMBL and DUD-E databases. 11 representative drug receptors from diverse drug target categories were applied to evaluate the hybrid scoring function. The effects of four different fingerprints (FP2, FP3, FP4, and MACCS) and the four different compound similarity effect (CSE) functions were explored. Encouragingly, the screening performance was significantly improved for all 11 drug targets especially when CSE = S 4 (S is the Tanimoto structural similarity) and FP2 fingerprint were applied. The average predictive index (PI) values increased from 0.34 to 0.66 and 0.39 to 0.71 for the Glide and AutoDock vina scoring functions, respectively. To evaluate the performance of the calibration algorithm in drug lead identification, we also imposed an upper limit on the structural similarity to mimic the real scenario of screening diverse libraries for which query ligands are general-purpose screening compounds and they are not necessarily structurally similar to reference ligands. Encouragingly, we found our hybrid scoring function still outperformed the original docking scoring function. The hybrid scoring function was further evaluated using external datasets for two systems and we found the PI values increased from 0.24 to 0.46 and 0.14 to 0.42 for A2AR and CFX systems, respectively. In a conclusion, our calibration algorithm can significantly improve the virtual screening performance in both drug lead optimization and identification phases with neglectable computational cost.

[1]  Julien Michel,et al.  Protein-ligand binding affinity predictions by implicit solvent simulations: a tool for lead optimization? , 2006, Journal of medicinal chemistry.

[2]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[3]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[4]  Hege S. Beard,et al.  Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. , 2004, Journal of medicinal chemistry.

[5]  Tai-Sung Lee,et al.  Toward Fast and Accurate Binding Affinity Prediction with pmemdGTI: An Efficient Implementation of GPU-Accelerated Thermodynamic Integration. , 2017, Journal of chemical theory and computation.

[6]  Julien Michel,et al.  Effects of Water Placement on Predictions of Binding Affinities for p38α MAP Kinase Inhibitors. , 2010, Journal of chemical theory and computation.

[7]  Dan Li,et al.  Comprehensive evaluation of ten docking programs on a diverse set of protein-ligand complexes: the prediction accuracy of sampling power and scoring power. , 2016, Physical chemistry chemical physics : PCCP.

[8]  P. Charifson,et al.  Are free energy calculations useful in practice? A comparison with rapid scoring functions for the p38 MAP kinase protein system. , 2001, Journal of medicinal chemistry.

[9]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[10]  Jie Liu,et al.  Classification of Current Scoring Functions , 2015, J. Chem. Inf. Model..

[11]  Leonardo L. G. Ferreira,et al.  Molecular Docking and Structure-Based Drug Design Strategies , 2015, Molecules.

[12]  J. Irwin,et al.  Benchmarking sets for molecular docking. , 2006, Journal of medicinal chemistry.

[13]  Jürgen Bajorath,et al.  Recent progress in understanding activity cliffs and their utility in medicinal chemistry. , 2014, Journal of medicinal chemistry.

[14]  Junmei Wang,et al.  Calculate protein–ligand binding affinities with the extended linear interaction energy method: application on the Cathepsin S set in the D3R Grand Challenge 3 , 2018, Journal of Computer-Aided Molecular Design.

[15]  Tai-Sung Lee,et al.  GPU-Accelerated Molecular Dynamics and Free Energy Methods in Amber18: Performance Enhancements and New Features , 2018, J. Chem. Inf. Model..

[16]  Ajay N. Jain,et al.  Recommendations for evaluation of computational methods , 2008, J. Comput. Aided Mol. Des..

[17]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[18]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[19]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[20]  Xiaoqin Zou,et al.  Advances and Challenges in Protein-Ligand Docking , 2010, International journal of molecular sciences.

[21]  Jennifer L. Knight,et al.  Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. , 2015, Journal of the American Chemical Society.

[22]  Gilles Klopmand,et al.  Concepts and applications of molecular similarity, by Mark A. Johnson and Gerald M. Maggiora, eds., John Wiley & Sons, New York, 1990, 393 pp. Price: $65.00 , 1992 .

[23]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[24]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[25]  C. Supuran,et al.  Development of a cheminformatics platform for selectivity analyses of carbonic anhydrase inhibitors , 2019, Journal of enzyme inhibition and medicinal chemistry.

[26]  Junmei Wang,et al.  Fast, Accurate, and Reliable Protocols for Routine Calculations of Protein–Ligand Binding Affinities in Drug Design Projects Using AMBER GPU-TI with ff14SB/GAFF , 2020, ACS omega.

[27]  George Papadatos,et al.  ChEMBL web services: streamlining access to drug discovery data and utilities , 2015, Nucleic Acids Res..

[28]  Junmei Wang,et al.  End-Point Binding Free Energy Calculation with MM/PBSA and MM/GBSA: Strategies and Applications in Drug Design. , 2019, Chemical reviews.

[29]  M F Sanner,et al.  Python: a programming language for software integration and development. , 1999, Journal of molecular graphics & modelling.

[30]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[31]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[32]  Juho Rousu,et al.  Metabolite Identification through Machine Learning — Tackling CASMI Challenge Using FingerID , 2013, Metabolites.