ALADDIN: Docking Approach Augmented by Machine Learning for Protein Structure Selection Yields Superior Virtual Screening Performance

Protein flexibility and solvation pose major challenges to docking algorithms and scoring functions. One established strategy for addressing these challenges is to use multiple protein conformations for docking (all‐against‐all ensemble docking). Recent studies have shown that the performance of ensemble docking can be improved by selecting the most relevant protein structures for docking. In search for a robust approach to protein structure selection, we have come up with an integrated mAchine Learning AnD DockINg approach (ALADDIN). ALADDIN employs a battery of random forest classifiers to select, individually for each compound of interest, from an ensemble of protein structures, the single most suitable protein structure for docking. ALADDIN outperformed the best single‐structure docking runs, ensemble docking and a similarity‐based docking approach on three out of four investigated targets, with up to 0.15, 0.11 and 0.16 higher area under the receiver operating characteristic curve (AUC) values, respectively. Only in the case of cytochrome P450 3A4, ALADDIN, like any of the other tested approaches, failed to obtain decent performance. ALADDIN can be particularly useful for structure‐based virtual screening of malleable proteins, including kinases, some viral enzymes and anti‐targets.

[1]  Amedeo Caflisch,et al.  Protein structure-based drug design: from docking to molecular dynamics. , 2018, Current opinion in structural biology.

[2]  Rommie E. Amaro,et al.  Emerging methods for ensemble-based virtual screening. , 2010, Current topics in medicinal chemistry.

[3]  Diane Joseph-McCarthy,et al.  Ensemble-Based Docking Using Biased Molecular Dynamics , 2014, J. Chem. Inf. Model..

[4]  Oliver Korb,et al.  Potential and Limitations of Ensemble Docking , 2012, J. Chem. Inf. Model..

[5]  M. Shibuya Vascular Endothelial Growth Factor (VEGF) and Its Receptor (VEGFR) Signaling in Angiogenesis: A Crucial Target for Anti- and Pro-Angiogenic Therapies. , 2011, Genes & cancer.

[6]  L. Dardenne,et al.  Receptor–ligand molecular docking , 2013, Biophysical Reviews.

[7]  Rommie E. Amaro,et al.  Ensemble Docking in Drug Discovery. , 2018, Biophysical journal.

[8]  J. Irwin,et al.  Docking Screens for Novel Ligands Conferring New Biology. , 2016, Journal of medicinal chemistry.

[9]  X. Barril,et al.  Unveiling the full potential of flexible receptor docking using multiple crystallographic structures. , 2005, Journal of medicinal chemistry.

[10]  M. Rarey,et al.  SIENA: Efficient Compilation of Selective Protein Binding Site Ensembles , 2016, J. Chem. Inf. Model..

[11]  J. Tuszynski,et al.  Software for molecular docking: a review , 2017, Biophysical Reviews.

[12]  K. Yamamoto,et al.  Glucocorticoid receptor control of transcription: precision and plasticity via allostery , 2017, Nature Reviews Molecular Cell Biology.

[13]  T. Lawrence,et al.  The kinase p38α serves cell type–specific inflammatory functions in skin injury and coordinates pro- and anti-inflammatory gene expression , 2008, Nature Immunology.

[14]  Woody Sherman,et al.  Improving database enrichment through ensemble docking , 2008, J. Comput. Aided Mol. Des..

[15]  Xavier Barril,et al.  Ensemble Docking from Homology Models. , 2010, Journal of chemical theory and computation.

[16]  Claudio N. Cavasotto,et al.  Open challenges in structure-based virtual screening: Receptor modeling, target flexibility consideration and active site water molecules description. , 2015, Archives of biochemistry and biophysics.

[17]  Claudio N. Cavasotto,et al.  Protein flexibility in ligand docking and virtual screening to protein kinases. , 2004, Journal of molecular biology.

[18]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[19]  Jonathan W. Essex,et al.  Ensemble Docking into Multiple Crystallographically Derived Protein Structures: An Evaluation Based on the Statistical Analysis of Enrichments , 2010, J. Chem. Inf. Model..

[20]  Mengang Xu,et al.  Utilizing Experimental Data for Reducing Ensemble Size in Flexible-Protein Docking , 2012, J. Chem. Inf. Model..

[21]  Nathanael Weill,et al.  Docking Ligands into Flexible and Solvated Macromolecules, 7. Impact of Protein Flexibility and Water Molecules on Docking-Based Virtual Screening Accuracy , 2014, J. Chem. Inf. Model..

[22]  Siti Azma Jusoh,et al.  Knowledge-Based Methods To Train and Optimize Virtual Screening Ensembles , 2016, J. Chem. Inf. Model..

[23]  ANATOLY M. RUVINSKY Role of binding entropy in the refinement of protein–ligand docking predictions: Analysis based on the use of 11 scoring functions , 2007, J. Comput. Chem..

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[26]  Michal Vieth,et al.  Lessons in Molecular Recognition, 2. Assessing and Improving Cross-Docking Accuracy , 2007, J. Chem. Inf. Model..

[27]  A Lavecchia,et al.  Virtual screening strategies in drug discovery: a critical review. , 2013, Current medicinal chemistry.

[28]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[29]  Ruben Abagyan,et al.  Recipes for the Selection of Experimental Protein Conformations for Virtual Screening , 2010, J. Chem. Inf. Model..

[30]  I. Kuntz,et al.  Molecular docking to ensembles of protein structures. , 1997, Journal of molecular biology.

[31]  Richard J. Hall,et al.  Protein-Ligand Docking against Non-Native Protein Conformers , 2008, J. Chem. Inf. Model..

[32]  Ruben Abagyan,et al.  Improved docking, screening and selectivity prediction for small molecule nuclear receptor modulators using conformational ensembles , 2010, J. Comput. Aided Mol. Des..

[33]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[34]  Surovi Saikia,et al.  Molecular Docking: Challenges, Advances and its Use in Drug Discovery Perspective. , 2019, Current drug targets.

[35]  Lirong Wang,et al.  ProSelection: A Novel Algorithm to Select Proper Protein Structure Subsets for in Silico Target Identification and Drug Discovery Research , 2017, J. Chem. Inf. Model..

[36]  Jeremy R. Greenwood,et al.  Epik: a software program for pKa prediction and protonation state generation for drug-like molecules , 2007, J. Comput. Aided Mol. Des..

[37]  F Peter Guengerich,et al.  Intersection of the Roles of Cytochrome P450 Enzymes with Xenobiotic and Endogenous Substrates: Relevance to Toxicity and Drug Interactions. , 2017, Chemical research in toxicology.

[38]  Yurii S. Moroz,et al.  Ultra-large library docking for discovering new chemotypes , 2019, Nature.

[39]  Emma Gordon,et al.  Mechanisms and regulation of endothelial VEGF receptor signalling , 2016, Nature Reviews Molecular Cell Biology.

[40]  J. Andrew McCammon,et al.  Method for Including the Dynamic Fluctuations of a Protein in Computer-Aided Drug Design , 1999 .