MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization

Molecule optimization is a fundamental task for accelerating drug discovery, with the goal of generating new valid molecules that maximize multiple drug properties while maintaining similarity to the input molecule. Existing generative models and reinforcement learning approaches made initial success, but still face difficulties in simultaneously optimizing multiple drug properties. To address such challenges, we propose the MultI-constraint MOlecule SAmpling (MIMOSA) approach, a sampling framework to use input molecule as an initial guess and sample molecules from the target distribution. MIMOSA first pretrains two property agnostic graph neural networks (GNNs) for molecule topology and substructure-type prediction, where a substructure can be either atom or single ring. For each iteration, MIMOSA uses the GNNs' prediction and employs three basic substructure operations (add, replace, delete) to generate new molecules and associated weights. The weights can encode multiple constraints including similarity and drug property constraints, upon which we select promising molecules for next iteration. MIMOSA enables flexible encoding of multiple property- and similarity-constraints and can efficiently generate new molecules that satisfy various property constraints and achieved up to 49.6% relative improvement over the best baseline in terms of success rate.

[1]  Jun S. Liu,et al.  The Multiple-Try Method and Local Optimization in Metropolis Sampling , 2000 .

[2]  Regina Barzilay,et al.  Learning Multimodal Graph-to-Graph Translation for Molecular Optimization , 2018, ICLR.

[3]  Regina Barzilay,et al.  Multi-resolution Autoregressive Graph-to-Graph Translation for Molecules , 2019, ArXiv.

[4]  Thomas Blaschke,et al.  Molecular de-novo design through deep reinforcement learning , 2017, Journal of Cheminformatics.

[5]  Regina Barzilay,et al.  Composing Molecules with Multiple Property Constraints , 2020, ICML 2020.

[6]  Alán Aspuru-Guzik,et al.  Reinforced Adversarial Neural Computer for de Novo Molecular Design , 2018, J. Chem. Inf. Model..

[7]  Li Li,et al.  Optimization of Molecules via Deep Reinforcement Learning , 2018, Scientific Reports.

[8]  Thomas Blaschke,et al.  Application of Generative Autoencoder in De Novo Molecular Design , 2017, Molecular informatics.

[9]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[10]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[11]  T. Jaakkola,et al.  Hierarchical Graph-to-Graph Translation for Molecules , 2019 .

[12]  Nicola De Cao,et al.  MolGAN: An implicit generative model for small molecular graphs , 2018, ArXiv.

[13]  Qi Liu,et al.  Constrained Graph Variational Autoencoders for Molecule Design , 2018, NeurIPS.

[14]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[15]  Olexandr Isayev,et al.  Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[16]  Steven Skiena,et al.  Syntax-Directed Variational Autoencoder for Structured Data , 2018, ICLR.

[17]  William L. Jorgensen,et al.  Journal of Chemical Information and Modeling , 2005, J. Chem. Inf. Model..

[18]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[19]  Jimeng Sun,et al.  CORE: Automatic Molecule Optimization Using Copy & Refine Strategy , 2019, AAAI.

[20]  D. Comings,et al.  Dopamine D2 receptor (DRD2) gene and susceptibility to posttraumatic stress disorder: A study and replication , 1996, Biological Psychiatry.

[21]  Lucas M. Glass,et al.  α-MOP: Molecule optimization with α-divergence , 2020, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[22]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[23]  Jure Leskovec,et al.  Strategies for Pre-training Graph Neural Networks , 2020, ICLR.

[24]  A. Zhavoronkov Artificial Intelligence for Drug Discovery, Biomarker Development, and Generation of Novel Chemistry. , 2018, Molecular pharmaceutics.

[25]  Jimeng Sun,et al.  MOLER: Incorporate Molecule-Level Reward to Enhance Deep Generative Model for Molecule Optimization , 2021, IEEE Transactions on Knowledge and Data Engineering.

[26]  Regina Barzilay,et al.  Multi-Objective Molecule Generation using Interpretable Substructures , 2020, ICML.

[27]  Jimeng Sun,et al.  DeepPurpose: a deep learning library for drug–target interaction prediction , 2020, Bioinform..

[28]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[29]  Marinka Zitnik,et al.  MolDesigner: Interactive Design of Efficacious Drugs with Deep Learning , 2020, ArXiv.

[30]  Alán Aspuru-Guzik,et al.  Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space , 2020, ICLR.

[31]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[32]  Zhihua Zhang,et al.  Quasi-Newton Hamiltonian Monte Carlo , 2016, UAI.

[33]  Regina Barzilay,et al.  Hierarchical Generation of Molecular Graphs using Structural Motifs , 2020, ICML.

[34]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[35]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[36]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[37]  A. Varnek,et al.  Fragment Descriptors in SAR/QSAR/QSPR Studies, Molecular Similarity Analysis and in Virtual Screening , 2009 .

[38]  Alexandre Varnek,et al.  Estimation of the size of drug-like chemical space based on GDB-17 data , 2013, Journal of Computer-Aided Molecular Design.

[39]  Zhihua Zhang,et al.  CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC , 2017, AISTATS.

[40]  J. Rosenthal,et al.  Markov Chain Monte Carlo , 2018 .

[41]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Richard A. Levine,et al.  Optimizing random scan Gibbs samplers , 2006 .

[43]  Igor I. Baskin,et al.  Chapter 1:Fragment Descriptors in SAR/QSAR/QSPR Studies, Molecular Similarity Analysis and in Virtual Screening , 2008 .