MolecularRNN: Generating realistic molecular graphs with optimized properties

Designing new molecules with a set of predefined properties is a core problem in modern drug discovery and development. There is a growing need for de-novo design methods that would address this problem. We present MolecularRNN, the graph recurrent generative model for molecular structures. Our model generates diverse realistic molecular graphs after likelihood pretraining on a big database of molecules. We perform an analysis of our pretrained models on large-scale generated datasets of 1 million samples. Further, the model is tuned with policy gradient algorithm, provided a critic that estimates the reward for the property of interest. We show a significant distribution shift to the desired range for lipophilicity, drug-likeness, and melting point outperforming state-of-the-art works. With the use of rejection sampling based on valency constraints, our model yields 100% validity. Moreover, we show that invalid molecules provide a rich signal to the model through the use of structure penalty in our reinforcement learning pipeline.

[1]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[2]  Igor V. Tetko,et al.  How Accurately Can We Predict the Melting Points of Drug-like Compounds? , 2014, J. Chem. Inf. Model..

[3]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[4]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[5]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[6]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[7]  J. Irwin,et al.  ZINC ? A Free Database of Commercially Available Compounds for Virtual Screening. , 2005 .

[8]  Qi Liu,et al.  Constrained Graph Variational Autoencoders for Molecule Design , 2018, NeurIPS.

[9]  Jure Leskovec,et al.  GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models , 2018, ICML.

[10]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[11]  Razvan Pascanu,et al.  Learning Deep Generative Models of Graphs , 2018, ICLR 2018.

[12]  Alán Aspuru-Guzik,et al.  Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models , 2018, Frontiers in Pharmacology.

[13]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[14]  Gisbert Schneider,et al.  Computer-based de novo design of drug-like molecules , 2005, Nature Reviews Drug Discovery.

[15]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[16]  Yibo Li,et al.  Multi-objective de novo drug design with conditional graph generative model , 2018, Journal of Cheminformatics.

[17]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[18]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[19]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[20]  Olexandr Isayev,et al.  Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[21]  Thomas Blaschke,et al.  Molecular de-novo design through deep reinforcement learning , 2017, Journal of Cheminformatics.

[22]  Sergey Nikolenko,et al.  druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico. , 2017, Molecular pharmaceutics.

[23]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.