Transferable Graph Neural Fingerprint Models for Quick Response to Future Bio-Threats

Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for developing molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we trained graph neural fin-gerprint docking models for high-throughput virtual COVID-19 drug screening. The graph neural fingerprint models yield high prediction accuracy on docking scores with the mean squared error lower than 0.21 kcal/mol for most of the docking targets, showing significant improvement over conventional circular fin-gerprint methods. To make the neural fingerprints transferable for unknown targets, we also propose a transferable graph neural fingerprint method trained on multiple targets. With comparable accuracy to target-specific graph neural fingerprint models, the training and data efficiency of the transferable model is several times higher. We highlight that the impact of this study extends beyond COVID-19 dataset, as our approach for fast virtual ligand screening can be easily adapted and integrated into a general machine learning-accelerated pipeline to battle future bio-threats.

[1]  D. Végh,et al.  Machine learning prediction of 3CLpro SARS-CoV-2 docking scores , 2022, Computational Biology and Chemistry.

[2]  Junzhou Huang,et al.  Application advances of deep learning methods for de novo drug design and molecular dynamics simulation , 2021, WIREs Computational Molecular Science.

[3]  S. Baud,et al.  Machine-learning methods for ligand-protein molecular docking. , 2021, Drug discovery today.

[4]  J. Martins,et al.  Molecular insights on ABL kinase activation using tree-based machine learning models and molecular docking , 2021, Molecular Diversity.

[5]  Christopher C. Stobart,et al.  Targeting novel structural and functional features of coronavirus protease nsp5 (3CLpro, Mpro) in the age of COVID-19 , 2021, The Journal of general virology.

[6]  M. Borgnia,et al.  Cryo-EM structures of the SARS-CoV-2 endoribonuclease Nsp15 reveal insight into nuclease specificity and dynamics , 2021, Nature Communications.

[7]  P. Shi,et al.  Ubiquitination of SARS-CoV-2 ORF7a promotes antagonism of interferon response , 2021, Cellular & Molecular Immunology.

[8]  David Ryan Koes,et al.  GNINA 1.0: molecular docking with deep learning , 2021, Journal of Cheminformatics.

[9]  Dongqing Wei,et al.  Structures of SARS-CoV-2 RNA-Binding Proteins and Therapeutic Targets , 2021, Intervirology.

[10]  R. Sowdhamini,et al.  DEELIG: A Deep Learning Approach to Predict Protein-Ligand Binding Affinity , 2021, Bioinformatics and biology insights.

[11]  Sagi Eppel,et al.  Deep Molecular Dreaming: Inverse machine learning for de-novo molecular design and interpretability with surjective representations , 2020, Mach. Learn. Sci. Technol..

[12]  Giuseppina Mariano,et al.  Structural Characterization of SARS-CoV-2: Where We Are, and Where We Need to Be , 2020, Frontiers in Molecular Biosciences.

[13]  S. Olsen,et al.  Activity profiling and crystal structures of inhibitor-bound SARS-CoV-2 papain-like protease: A framework for anti–COVID-19 drug design , 2020, Science Advances.

[14]  K. Ita Coronavirus Disease (COVID-19): Current Status and Prospects for Drug and Vaccine Development , 2020, Archives of Medical Research.

[15]  E. Bouřa,et al.  Structural analysis of the SARS-CoV-2 methyltransferase complex involved in RNA cap creation bound to sinefungin , 2020, Nature Communications.

[16]  A. Joachimiak,et al.  Crystal structures of SARS-CoV-2 ADP-ribose phosphatase: from the apo form to ligand complexes , 2020, IUCrJ.

[17]  Artem Cherkasov,et al.  Deep Docking: A Deep Learning Platform for Augmentation of Structure Based Drug Discovery , 2020, ACS central science.

[18]  Chung F. Wong,et al.  Using machine learning to improve ensemble docking for drug discovery , 2020, Proteins.

[19]  Benjamin J. Polacco,et al.  A SARS-CoV-2 Protein Interaction Map Reveals Targets for Drug-Repurposing , 2020, Nature.

[20]  A. Godzik,et al.  Crystal structure of Nsp15 endoribonuclease NendoU from SARS‐CoV‐2 , 2020, bioRxiv.

[21]  Lixia Chen,et al.  Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods , 2020, Acta Pharmaceutica Sinica B.

[22]  Eric J. Deeds,et al.  Machine learning classification can reduce false positives in structure-based virtual screening , 2020, Proceedings of the National Academy of Sciences.

[23]  Joseph A Morrone,et al.  Combining Docking Pose Rank and Structure with Deep Learning Improves Protein-Ligand Binding Mode Prediction over a Baseline Docking Approach , 2019, J. Chem. Inf. Model..

[24]  Alex Smola,et al.  Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.

[25]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[26]  Arzucan Özgür,et al.  DeepDTA: deep drug–target binding affinity prediction , 2018, Bioinform..

[27]  Xavier Bresson,et al.  Residual Gated Graph ConvNets , 2017, ArXiv.

[28]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[29]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[30]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[31]  Esben J. Bjerrum,et al.  Machine learning optimization of cross docking accuracy , 2016, Comput. Biol. Chem..

[32]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[33]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[34]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[35]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[36]  Leonardo L. G. Ferreira,et al.  Molecular Docking and Structure-Based Drug Design Strategies , 2015, Molecules.

[37]  Nihar R. Mahapatra,et al.  Machine-learning scoring functions for identifying native poses of ligands docked to known and novel proteins , 2015, BMC Bioinformatics.

[38]  Walid Gomaa,et al.  Machine learning in computational docking , 2015, Artif. Intell. Medicine.

[39]  Jie Li,et al.  PDB-wide collection of binding data: current status of the PDBbind database , 2015, Bioinform..

[40]  H. Kitano,et al.  Combining Machine Learning Systems and Multiple Docking Simulation Packages to Improve Docking Prediction Reliability for Network Pharmacology , 2013, PloS one.

[41]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[42]  Stefano Forli,et al.  A force field with discrete displaceable waters and desolvation entropy for hydrated ligand docking. , 2012, Journal of medicinal chemistry.

[43]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[44]  Jacob D. Durrant,et al.  NNScore: A Neural-Network-Based Scoring Function for the Characterization of Protein−Ligand Complexes , 2010, J. Chem. Inf. Model..

[45]  Gary B. Fogel,et al.  Machine learning approaches for customized docking scores: Modeling of inhibition of Mycobacterium tuberculosis enoyl acyl carrier protein reductase , 2010, 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[46]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[47]  M. Hahn,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[48]  David S. Goodsell,et al.  AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility , 2009, J. Comput. Chem..

[49]  A. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[50]  Vincent Le Guilloux,et al.  Fpocket: An open source platform for ligand pocket detection , 2009, BMC Bioinformatics.

[51]  David S. Goodsell,et al.  A semiempirical free energy force field with charge‐based desolvation , 2007, J. Comput. Chem..

[52]  Paul N. Mortenson,et al.  Diverse, high-quality test set for the validation of protein-ligand docking performance. , 2007, Journal of medicinal chemistry.

[53]  Pedro Alexandrino Fernandes,et al.  Protein–ligand docking: Current status and future challenges , 2006, Proteins.

[54]  Michael G. Lerner,et al.  Binding MOAD (Mother Of All Databases) , 2005, Proteins.

[55]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[56]  W. Guida,et al.  The art and practice of structure‐based drug design: A molecular modeling perspective , 1996, Medicinal research reviews.

[57]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[58]  P. Goodford A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. , 1985, Journal of medicinal chemistry.

[59]  P J Goodford,et al.  Drug design by the method of receptor fit. , 1984, Journal of medicinal chemistry.

[60]  J M Blaney,et al.  A geometric approach to macromolecule-ligand interactions. , 1982, Journal of molecular biology.

[61]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[62]  E. Fischer Einfluss der Configuration auf die Wirkung der Enzyme , 1894 .

[63]  Anthony D. Hill,et al.  Scoring functions for AutoDock. , 2015, Methods in molecular biology.

[64]  Fabian Pedregosa,et al.  Independent consultant , 2013 .