Differentiable Scaffolding Tree for Molecular Optimization

The structural design of functional molecules, also called molecular optimization, is an essential chemical science and engineering task with important applications, such as drug discovery. Deep generative models and combinatorial optimization methods achieve initial success but still struggle with directly modeling discrete chemical structures and often heavily rely on brute-force enumeration. The challenge comes from the discrete and non-differentiable nature of molecule structures. To address this, we propose differentiable scaffolding tree (DST) that utilizes a learned knowledge network to convert discrete chemical structures to locally differentiable ones. DST enables a gradient-based optimization on a chemical graph structure by back-propagating the derivatives from the target properties through a graph neural network (GNN). Our empirical studies show the gradient-based molecular optimizations are both effective and sample efficient. Furthermore, the learned graph parameters can also provide an explanation that helps domain experts understand the model output.

[1]  Weinan Zhang,et al.  MARS: Markov Molecular Sampling for Multi-objective Drug Discovery , 2021, ICLR.

[2]  Ola Engkvist,et al.  Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks , 2020, Nature Machine Intelligence.

[3]  Katsuhiko Ishiguro,et al.  Graph Residual Flow for Molecular Graph Generation , 2019, ArXiv.

[4]  Weinan Zhang,et al.  GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation , 2020, ICLR.

[5]  Yanli Wang,et al.  PubChem BioAssay: 2017 update , 2016, Nucleic Acids Res..

[6]  Regina Barzilay,et al.  Learning Multimodal Graph-to-Graph Translation for Molecular Optimization , 2018, ICLR.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Manish Sinha,et al.  Environmentally benign solvent design by global optimization , 1999 .

[9]  S. Skoulika,et al.  Thermal rearrangement of spiro[naphthalene(naphthopyranofurazan)]oxides to spiro[naphthalene(phenalenofurazan)oxides. A probable furazan oxide triggered tandem isomerisation process , 2005 .

[10]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[11]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[12]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[13]  Li Li,et al.  Optimization of Molecules via Deep Reinforcement Learning , 2018, Scientific Reports.

[14]  Regina Barzilay,et al.  Multi-Objective Molecule Generation using Interpretable Substructures , 2020, ICML.

[15]  Chenru Duan,et al.  Accurate Multiobjective Design in a Space of Millions of Transition Metal Complexes with Neural-Network-Driven Efficient Global Optimization , 2020, ACS central science.

[16]  Hassan Foroosh,et al.  Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization , 2019, ACL.

[17]  Alán Aspuru-Guzik,et al.  Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space , 2020, ICLR.

[18]  Wenhao Gao,et al.  The Synthesizability of Molecules Proposed by Generative Models , 2020, J. Chem. Inf. Model..

[19]  Jimeng Sun,et al.  CORE: Automatic Molecule Optimization Using Copy & Refine Strategy , 2019, AAAI.

[20]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[21]  Marwin H. S. Segler,et al.  GuacaMol: Benchmarking Models for De Novo Molecular Design , 2018, J. Chem. Inf. Model..

[22]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[23]  I. Kuntz Structure-Based Strategies for Drug Design and Discovery , 1992, Science.

[24]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[25]  Yuanqi Du,et al.  Property Controllable Variational Autoencoder via Invertible Mutual Dependence , 2021, ICLR.

[26]  Jimeng Sun,et al.  Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development , 2021, NeurIPS Datasets and Benchmarks.

[27]  Nicola De Cao,et al.  MolGAN: An implicit generative model for small molecular graphs , 2018, ArXiv.

[28]  Alán Aspuru-Guzik,et al.  The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid , 2011 .

[29]  Alán Aspuru-Guzik,et al.  Deep learning enables rapid identification of potent DDR1 kinase inhibitors , 2019, Nature Biotechnology.

[30]  Kirthevasan Kandasamy,et al.  ChemBO: Bayesian Optimization of Small Organic Molecules with Synthesizable Recommendations , 2019, AISTATS.

[31]  Yoshua Bengio,et al.  Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning , 2020, ICML.

[32]  Zhigang Shuai,et al.  Electronic structure and carrier mobility in graphdiyne sheet and nanoribbons: theoretical predictions. , 2011, ACS nano.

[33]  Hedvig Kjellstrom,et al.  Determinantal Point Processes for Mini-Batch Diversification , 2017, UAI 2017.

[34]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[35]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[36]  Yibo Li,et al.  Multi-objective de novo drug design with conditional graph generative model , 2018, Journal of Cheminformatics.

[37]  Steven Skiena,et al.  Syntax-Directed Variational Autoencoder for Structured Data , 2018, ICLR.

[38]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[39]  Le Song,et al.  Molecule Optimization by Explainable Evolution , 2021, ICLR.

[40]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[41]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[42]  Meng Liu,et al.  GraphEBM: Molecular Graph Generation with Energy-Based Models , 2021, ArXiv.

[43]  Motoki Abe,et al.  GraphNVP: An Invertible Flow Model for Generating Molecular Graphs , 2019, ArXiv.

[44]  Laming Chen,et al.  Fast Greedy MAP Inference for Determinantal Point Process to Improve Recommendation Diversity , 2017, NeurIPS.

[45]  Jan H. Jensen,et al.  A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space , 2018, Chemical science.

[46]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[47]  Jimeng Sun,et al.  Therapeutics Data Commons: Machine Learning Datasets and Tasks for Therapeutics , 2021, ArXiv.

[48]  Jimeng Sun,et al.  Probabilistic and Dynamic Molecule-Disease Interaction Modeling for Drug Discovery , 2021, Knowledge Discovery and Data Mining.

[49]  Fei Wang,et al.  MoFlow: An Invertible Flow Model for Generating Molecular Graphs , 2020, KDD.

[50]  Yuanqi Du,et al.  GraphGT: Machine Learning Datasets for Graph Generation and Transformation , 2021, NeurIPS Datasets and Benchmarks.

[51]  Jimeng Sun,et al.  MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization , 2021, AAAI.

[52]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[53]  U. Deva Priyakumar,et al.  LigGPT: Molecular Generation using a Transformer-Decoder Model , 2021 .

[54]  N. Sahinidis,et al.  Applications of global optimization to process and molecular design , 2000 .