A de novo molecular generation method using latent vector based generative adversarial network

Deep learning methods applied to drug discovery have been used to generate novel structures. In this study, we propose a new deep learning architecture, LatentGAN, which combines an autoencoder and a generative adversarial neural network for de novo molecular design. We applied the method in two scenarios: one to generate random drug-like compounds and another to generate target-biased compounds. Our results show that the method works well in both cases. Sampled compounds from the trained model can largely occupy the same chemical space as the training set and also generate a substantial fraction of novel compounds. Moreover, the drug-likeness score of compounds sampled from LatentGAN is also similar to that of the training set. Lastly, generated compounds differ from those obtained with a Recurrent Neural Network-based generative model approach, indicating that both methods can be used complementarily.

[1]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[2]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[3]  Adam Roberts,et al.  Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models , 2017, ICLR.

[4]  O. Engkvist,et al.  Direct Steering of de novo Molecular Generation using Descriptor Conditional Recurrent Neural Networks (cRNNs) , 2019 .

[5]  Sepp Hochreiter,et al.  Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery , 2018, J. Chem. Inf. Model..

[6]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[7]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[8]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[9]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[10]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[11]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[12]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[13]  Ola Engkvist,et al.  Cheminformatics in Drug Discovery, an Industrial Perspective , 2018, Molecular informatics.

[14]  Yang Song,et al.  Age Progression/Regression by Conditional Adversarial Autoencoder , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Markus Hartenfeller,et al.  Reaction-driven de novo design, synthesis and testing of potential type II kinase inhibitors. , 2011, Future medicinal chemistry.

[16]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings , 1997 .

[17]  Dmitry Vetrov,et al.  Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery. , 2018, Molecular pharmaceutics.

[18]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[19]  Lars Carlsson,et al.  ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics , 2017, Journal of Cheminformatics.

[20]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[22]  Lars Carlsson,et al.  Erratum to: ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics , 2017, Journal of Cheminformatics.

[23]  Nicola De Cao,et al.  MolGAN: An implicit generative model for small molecular graphs , 2018, ArXiv.

[24]  Yibo Li,et al.  Multi-objective de novo drug design with conditional graph generative model , 2018, Journal of Cheminformatics.

[25]  Petra Schneider,et al.  De Novo Design at the Edge of Chaos. , 2016, Journal of medicinal chemistry.

[26]  Andrey Kazennov,et al.  The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology , 2016, Oncotarget.

[27]  Thomas Blaschke,et al.  Application of Generative Autoencoder in De Novo Molecular Design , 2017, Molecular informatics.

[28]  Alán Aspuru-Guzik,et al.  Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC) , 2017 .

[29]  Alán Aspuru-Guzik,et al.  Reinforced Adversarial Neural Computer for de Novo Molecular Design , 2018, J. Chem. Inf. Model..

[30]  Jin Woo Kim,et al.  Molecular generative model based on conditional variational autoencoder for de novo molecular design , 2018, Journal of Cheminformatics.

[31]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[32]  Esben Jannik Bjerrum,et al.  Improving Chemical Autoencoder Latent Space and Molecular De Novo Generation Diversity with Heteroencoders , 2018, Biomolecules.

[33]  Alán Aspuru-Guzik,et al.  Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC) , 2017 .

[34]  Thomas Blaschke,et al.  Molecular de-novo design through deep reinforcement learning , 2017, Journal of Cheminformatics.

[35]  Gisbert Schneider,et al.  Deep Learning in Drug Discovery , 2016, Molecular informatics.

[36]  Evgeny Putin,et al.  Adversarial Threshold Neural Computer for Molecular de Novo Design. , 2018, Molecular pharmaceutics.

[37]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[38]  Sean Ekins The Next Era: Deep Learning in Pharmaceutical Research , 2016, Pharmaceutical Research.

[39]  Razvan Pascanu,et al.  Learning Deep Generative Models of Graphs , 2018, ICLR 2018.

[40]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[41]  Ola Engkvist,et al.  Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks , 2020, Nature Machine Intelligence.

[42]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[43]  Russ B Altman,et al.  Machine learning in chemoinformatics and drug discovery. , 2018, Drug discovery today.

[44]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[45]  Yun Luo,et al.  EEG Data Augmentation for Emotion Recognition Using a Conditional Wasserstein GAN , 2018, 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[46]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[47]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[48]  Ola Engkvist,et al.  Randomized SMILES strings improve the quality of molecular generative models , 2019, Journal of Cheminformatics.

[49]  Esben Jannik Bjerrum,et al.  SMILES Enumeration as Data Augmentation for Neural Network Modeling of Molecules , 2017, ArXiv.

[50]  Alán Aspuru-Guzik,et al.  Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models , 2018, Frontiers in Pharmacology.

[51]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[52]  Andreas Persidis,et al.  Artificial intelligence for drug design , 1997, Nature Biotechnology.

[53]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[54]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[55]  Lorenz C. Blum,et al.  Classification of Organic Molecules by Molecular Quantum Numbers , 2009, ChemMedChem.

[56]  Xing Chen,et al.  Inferring potential small molecule–miRNA association based on triple layer heterogeneous network , 2018, Journal of Cheminformatics.

[57]  Thomas Blaschke,et al.  Exploring the GDB-13 chemical space using deep generative models , 2018, Journal of Cheminformatics.

[58]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..