Conditional molecular design with deep generative models

Although machine learning has been successfully used to propose novel molecules that satisfy desired properties, it is still challenging to explore a large chemical space efficiently. In this paper, we present a conditional molecular design method that facilitates generating new molecules with desired properties. The proposed model, which simultaneously performs both property prediction and molecule generation, is built as a semisupervised variational autoencoder trained on a set of existing molecules with only a partial annotation. We generate new molecules with desired properties by sampling from the generative distribution estimated by the model. We demonstrate the effectiveness of the proposed model by evaluating it on drug-like molecules. The model improves the performance of property prediction by exploiting unlabeled molecules and efficiently generates novel molecules fulfilling various target conditions.

[1]  Anubhav Jain,et al.  From the computer to the laboratory: materials discovery and design using first-principles calculations , 2012, Journal of Materials Science.

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[4]  Thomas Blaschke,et al.  Molecular de-novo design through deep reinforcement learning , 2017, Journal of Cheminformatics.

[5]  Olexandr Isayev,et al.  Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[6]  Alán Aspuru-Guzik,et al.  What Is High-Throughput Virtual Screening? A Perspective from Organic Materials Discovery , 2015 .

[7]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[8]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[9]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[10]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[11]  Pierre Baldi,et al.  Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules , 2013, J. Chem. Inf. Model..

[12]  Nicola De Cao,et al.  MolGAN: An implicit generative model for small molecular graphs , 2018, ArXiv.

[13]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[14]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[15]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[16]  Igor I. Baskin,et al.  Machine Learning Methods for Property Prediction in Chemoinformatics: Quo Vadis? , 2012, J. Chem. Inf. Model..

[17]  Regina Barzilay,et al.  Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction , 2017, J. Chem. Inf. Model..

[18]  Niloy Ganguly,et al.  Designing Random Graph Models Using Variational Autoencoders With Applications to Chemical Design , 2018, ArXiv.

[19]  Gisbert Schneider,et al.  Trends in virtual combinatorial library design. , 2002, Current medicinal chemistry.

[20]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[21]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[22]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[23]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Alex Graves,et al.  Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[26]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[27]  M. Karelson,et al.  QSPR: the correlation and quantitative prediction of chemical and physical properties from structure , 1995 .

[28]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[29]  Matthias Rupp,et al.  Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. , 2015, Journal of chemical theory and computation.

[30]  Gordon M. Crippen,et al.  Prediction of Physicochemical Parameters by Atomic Contributions , 1999, J. Chem. Inf. Comput. Sci..

[31]  Jean-Marie Lehn,et al.  Dynamic Combinatorial Chemistry and Virtual Combinatorial Libraries , 1999 .

[32]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[33]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[34]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.

[35]  Stephen R. Heller,et al.  InChI, the IUPAC International Chemical Identifier , 2015, Journal of Cheminformatics.

[36]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[37]  Abhinav Vishnu,et al.  Deep learning for computational chemistry , 2017, J. Comput. Chem..

[38]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[39]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[40]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[41]  J. Lehn,et al.  Virtual combinatorial libraries: dynamic generation of molecular and supramolecular diversity by self-assembly. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[43]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[44]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[45]  Yoshua Bengio,et al.  Audio Chord Recognition with Recurrent Neural Networks , 2013, ISMIR.

[46]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[47]  Niloy Ganguly,et al.  NeVAE: A Deep Generative Model for Molecular Graphs , 2018, AAAI.

[48]  Steven Skiena,et al.  Syntax-Directed Variational Autoencoder for Structured Data , 2018, ICLR.