CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models

The novel nature of SARS-CoV-2 calls for the development of efficient de novo drug design approaches. In this study, we propose an end-to-end framework, named CogMol (Controlled Generation of Molecules), for designing new drug-like small molecules targeting novel viral proteins with high affinity and off-target selectivity. CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme that uses guidance from attribute predictors trained on latent features. To generate novel and optimal drug-like molecules for unseen viral targets, CogMol leverages a protein-molecule binding affinity predictor that is trained using SMILES VAE embeddings and protein sequence embeddings learned unsupervised from a large corpus. CogMol framework is applied to three SARS-CoV-2 target proteins: main protease, receptor-binding domain of the spike protein, and non-structural protein 9 replicase. The generated candidates are novel at both molecular and chemical scaffold levels when compared to the training data. CogMol also includes insilico screening for assessing toxicity of parent molecules and their metabolites with a multi-task toxicity classifier, synthetic feasibility with a chemical retrosynthesis predictor, and target structure binding with docking simulations. Docking reveals favorable binding of generated molecules to the target protein structure, where 87-95 % of high affinity molecules showed docking free energy < -6 kcal/mol. When compared to approved drugs, the majority of designed compounds show low parent molecule and metabolite toxicity and high synthetic feasibility. In summary, CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity, and does not need target-dependent fine-tuning of the framework or target structure information.

[1]  Jin Woo Kim,et al.  Molecular generative model based on conditional variational autoencoder for de novo molecular design , 2018, Journal of Cheminformatics.

[2]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[3]  Nicola De Cao,et al.  MolGAN: An implicit generative model for small molecular graphs , 2018, ArXiv.

[4]  Thomas Blaschke,et al.  Application of Generative Autoencoder in De Novo Molecular Design , 2017, Molecular informatics.

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Albert-László Barabási,et al.  Network-based prediction of drug combinations , 2019, Nature Communications.

[7]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[8]  Li Li,et al.  Optimization of Molecules via Deep Reinforcement Learning , 2018, Scientific Reports.

[9]  Cao Xiao,et al.  Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders , 2018, NeurIPS.

[10]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[11]  Niroshini Nirmalan,et al.  “Omics”-Informed Drug and Biomarker Discovery: Opportunities, Challenges and Future Perspectives , 2016, Proteomes.

[12]  G. Herrler,et al.  SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2 and Is Blocked by a Clinically Proven Protease Inhibitor , 2020, Cell.

[13]  Razvan Pascanu,et al.  Learning Deep Generative Models of Graphs , 2018, ICLR 2018.

[14]  Jannis Born,et al.  PaccMannRL: Designing Anticancer Drugs From Transcriptomic Data via Reinforcement Learning , 2020, RECOMB.

[15]  Niloy Ganguly,et al.  NeVAE: A Deep Generative Model for Molecular Graphs , 2018, AAAI.

[16]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[17]  Steven Skiena,et al.  Syntax-Directed Variational Autoencoder for Structured Data , 2018, ICLR.

[18]  R. Hilgenfeld,et al.  Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors , 2020, Science.

[19]  Christopher A. Hunter,et al.  Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction , 2018, ACS central science.

[20]  Jürgen Bajorath,et al.  Data-Driven Exploration of Selectivity and Off-Target Activities of Designated Chemical Probes , 2018, Molecules.

[21]  Alán Aspuru-Guzik,et al.  Deep learning enables rapid identification of potent DDR1 kinase inhibitors , 2019, Nature Biotechnology.

[22]  Yibo Li,et al.  Multi-objective de novo drug design with conditional graph generative model , 2018, Journal of Cheminformatics.

[23]  Yingyu Liang,et al.  N-Gram Graph: Simple Unsupervised Representation for Graphs, with Applications to Molecules , 2018, NeurIPS.

[24]  Wu Zhong,et al.  Hydroxychloroquine, a less toxic derivative of chloroquine, is effective in inhibiting SARS-CoV-2 infection in vitro , 2020, Cell Discovery.

[25]  Thomas Blaschke,et al.  Molecular de-novo design through deep reinforcement learning , 2017, Journal of Cheminformatics.

[26]  Riccardo Petraglia,et al.  Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy† , 2020, Chemical science.

[27]  Xiaotao Lu,et al.  Coronavirus Susceptibility to the Antiviral Remdesivir (GS-5734) Is Mediated by the Viral Polymerase and the Proofreading Exoribonuclease , 2018, mBio.

[28]  Hiroshi Kajino,et al.  Molecular Hypergraph Grammar with its Application to Molecular Optimization , 2018, ICML.

[29]  Antonio Peón,et al.  Predicting the Reliability of Drug-target Interaction Predictions with Maximum Coverage of Target Space , 2017, Scientific Reports.

[30]  Kyunghyun Cho,et al.  Conditional molecular design with deep generative models , 2018, J. Chem. Inf. Model..

[31]  Olexandr Isayev,et al.  Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[32]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[33]  Tao Jiang,et al.  A maximum common substructure-based algorithm for searching and predicting drug-like compounds , 2008, ISMB.

[34]  Alán Aspuru-Guzik,et al.  Reinforced Adversarial Neural Computer for de Novo Molecular Design , 2018, J. Chem. Inf. Model..

[35]  John Canny,et al.  Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.

[36]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[37]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[38]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[39]  Michael K. Gilson,et al.  BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology , 2015, Nucleic Acids Res..

[40]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.

[41]  Ruili Huang,et al.  Tox21Challenge to Build Predictive Models of Nuclear Receptor and Stress Response Pathways as Mediated by Exposure to Environmental Chemicals and Drugs , 2016, Front. Environ. Sci..

[42]  David Hoksza,et al.  PrankWeb: web server for ligand binding-site prediction and visualization , 2019, bioRxiv.

[43]  Alexandre Varnek,et al.  Estimation of the size of drug-like chemical space based on GDB-17 data , 2013, Journal of Computer-Aided Molecular Design.

[44]  Alán Aspuru-Guzik,et al.  Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models , 2018, Frontiers in Pharmacology.

[45]  Gianni De Fabritiis,et al.  From Target to Drug: Generative Modeling for Multimodal Structure-Based Ligand Design. , 2019, Molecular pharmaceutics.

[46]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[47]  Sepp Hochreiter,et al.  Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery , 2018, J. Chem. Inf. Model..

[48]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[49]  Petra Schneider,et al.  Generative Recurrent Networks for De Novo Drug Design , 2017, Molecular informatics.

[50]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[51]  David Hoksza,et al.  P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure , 2018, Journal of Cheminformatics.

[52]  Bonnie Berger,et al.  Learning protein sequence embeddings using information from structure , 2019, ICLR.

[53]  Nuno Fernandes,et al.  Economic Effects of Coronavirus Outbreak (COVID-19) on the World Economy , 2020, SSRN Electronic Journal.

[54]  Daria Grechishnikova,et al.  Transformer neural network for protein-specific de novo drug generation as a machine translation problem , 2019, Scientific Reports.

[55]  Christophe Meyer,et al.  The use of novel selectivity metrics in kinase research , 2017, BMC Bioinformatics.

[56]  Richard E. Turner,et al.  Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control , 2016, ICML.

[57]  Yi Wang,et al.  Remdesivir in adults with severe COVID-19: a randomised, double-blind, placebo-controlled, multicentre trial , 2020, The Lancet.

[58]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[59]  Aleksandra Mojsilovic,et al.  Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics , 2020, ArXiv.

[60]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[61]  Di Wu,et al.  DeepAffinity: Interpretable Deep Learning of Compound-Protein Affinity through Unified Recurrent and Convolutional Neural Networks , 2018, bioRxiv.

[62]  Zacharias E. Andreadakis,et al.  The COVID-19 vaccine development landscape , 2020, Nature Reviews Drug Discovery.

[63]  George M. Church,et al.  Unified rational protein engineering with sequence-based deep representation learning , 2019, Nature Methods.

[64]  Lydia E. Kavraki,et al.  Prediction of drug metabolites using neural machine translation , 2020 .