MERMAID: an open source automated hit-to-lead method based on deep reinforcement learning

The hit-to-lead process makes the physicochemical properties of the hit molecules that show the desired type of activity obtained in the screening assay more drug-like. Deep learning-based molecular generative models are expected to contribute to the hit-to-lead process. The simplified molecular input line entry system (SMILES), which is a string of alphanumeric characters representing the chemical structure of a molecule, is one of the most commonly used representations of molecules, and molecular generative models based on SMILES have achieved significant success. However, in contrast to molecular graphs, during the process of generation, SMILES are not considered as valid SMILES. Further, it is quite difficult to generate molecules starting from a certain molecule, thus making it difficult to apply SMILES to the hit-to-lead process. In this study, we have developed a SMILES-based generative model that can be generated starting from a certain molecule. This method generates partial SMILES and inserts it into the original SMILES using Monte Carlo Tree Search and a Recurrent Neural Network. We validated our method using a molecule dataset obtained from the ZINC database and successfully generated molecules that were both well optimized for the objectives of the quantitative estimate of drug-likeness (QED) and penalized octanol-water partition coefficient (PLogP) optimization. The source code is available at https://github.com/sekijima-lab/mermaid .

[1]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[2]  Gisbert Schneider,et al.  Virtual screening: an endless staircase? , 2010, Nature Reviews Drug Discovery.

[3]  Daniel C. Elton,et al.  Deep learning for molecular generation and optimization - a review of the state of the art , 2019, Molecular Systems Design & Engineering.

[4]  Daisuke Kihara,et al.  An iterative compound screening contest method for identifying target protein inhibitors using the tyrosine-protein kinase Yes , 2017, Scientific Reports.

[5]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[6]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[7]  Asher Mullard New drugs cost US$2.6 billion to develop , 2014, Nature Reviews Drug Discovery.

[8]  Regina Barzilay,et al.  Hierarchical Generation of Molecular Graphs using Structural Motifs , 2020, ICML.

[9]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[10]  Li Li,et al.  Optimization of Molecules via Deep Reinforcement Learning , 2018, Scientific Reports.

[11]  Walter Thiel,et al.  QM/MM methods for biomolecular systems. , 2009, Angewandte Chemie.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[14]  Koji Tsuda,et al.  ChemTS: an efficient python library for de novo molecular generation , 2017, Science and technology of advanced materials.

[15]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[16]  David Ryan Koes,et al.  Protein-Ligand Scoring with Convolutional Neural Networks , 2016, Journal of chemical information and modeling.

[17]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[18]  B. Stockwell,et al.  High-Throughput and High-Content Screening for Huntington’s Disease Therapeutics , 2011 .

[19]  Kun-Yi Hsin,et al.  Identification of potential inhibitors based on compound proposal contest: Tyrosine-protein kinase Yes as a target , 2015, Scientific Reports.

[20]  Kaifu Gao,et al.  Generative Network Complex for the Automated Generation of Drug-like Molecules , 2020, J. Chem. Inf. Model..

[21]  Masakazu Sekijima,et al.  Improved Method of Structure-Based Virtual Screening via Interaction-Energy-Based Learning , 2019, J. Chem. Inf. Model..

[22]  Shogo D. Suzuki,et al.  A prospective compound screening contest identified broader inhibitors for Sirtuin 1 , 2019, Scientific Reports.

[23]  Weinan Zhang,et al.  GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation , 2020, ICLR.

[24]  Yusuke Nakashima,et al.  CoDe-DTI: Collaborative Deep Learning-based Drug-Target Interaction Prediction , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[25]  Krzysztof Rataj,et al.  Mol-CycleGAN: a generative model for molecular optimization , 2019, Journal of Cheminformatics.

[26]  Nicola De Cao,et al.  MolGAN: An implicit generative model for small molecular graphs , 2018, ArXiv.

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[29]  Djork-Arné Clevert,et al.  Efficient multi-objective molecular optimization in a continuous latent space , 2019, Chemical science.

[30]  V. Srinivasa Rao,et al.  Modern drug discovery process: An in silico approach , 2011 .

[31]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[32]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[33]  Kwong-Sak Leung,et al.  Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets , 2015, Molecular informatics.

[34]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[35]  Walter Thiel,et al.  QM/MM Methods for Biomolecular Systems , 2009 .

[36]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.