A Model to Search for Synthesizable Molecules

Deep generative models are able to suggest new organic molecules by generating strings, trees, and graphs representing their structure. While such models allow one to generate molecules with desirable properties, they give no guarantees that the molecules can actually be synthesized in practice. We propose a new molecule generation model, mirroring a more realistic real-world process, where (a) reactants are selected, and (b) combined to form more complex molecules. More specifically, our generative model proposes a bag of initial reactants (selected from a pool of commercially-available molecules) and uses a reaction model to predict how they react together to generate new molecules. We first show that the model can generate diverse, valid and unique molecules due to the useful inductive biases of modeling reactions. Furthermore, our model allows chemists to interrogate not only the properties of the generated molecules but also the feasibility of the synthesis routes. We conclude by using our model to solve retrosynthesis problems, predicting a set of reactants that can produce a target product.

[1]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[2]  Matt J. Kusner,et al.  A Generative Model For Electron Paths , 2018, ICLR.

[3]  Hiroshi Kajino,et al.  Molecular Hypergraph Grammar with its Application to Molecular Optimization , 2018, ICML.

[4]  Matthias Rarey,et al.  Similarity searching in large combinatorial chemistry spaces , 2001, J. Comput. Aided Mol. Des..

[5]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[6]  Thomas Lengauer,et al.  Automatic Generation of Complementary Descriptors with Molecular Graph Networks , 2005, J. Chem. Inf. Model..

[7]  Constantine Bekas,et al.  “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models† †Electronic supplementary information (ESI) available: Time-split test set and example predictions, together with attention weights, confidence and token probabilities. See DO , 2017, Chemical science.

[8]  Regina Barzilay,et al.  Learning Multimodal Graph-to-Graph Translation for Molecular Optimization , 2018, ICLR.

[9]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[10]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[13]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[14]  Matt J. Kusner,et al.  Bayesian Optimization with Inequality Constraints , 2014, ICML.

[15]  Alán Aspuru-Guzik,et al.  Neural Networks for the Prediction of Organic Chemistry Reactions , 2016, ACS central science.

[16]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[17]  Nicola De Cao,et al.  MolGAN: An implicit generative model for small molecular graphs , 2018, ArXiv.

[18]  Christopher A. Hunter,et al.  Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction , 2018, ACS central science.

[19]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[20]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[21]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[22]  Florent Chevillard,et al.  SCUBIDOO: A Large yet Screenable and Easily Searchable Database of Computationally Created Chemical Compounds Optimized toward High Likelihood of Synthetic Tractability , 2015, J. Chem. Inf. Model..

[23]  Petra Schneider,et al.  De Novo Design at the Edge of Chaos. , 2016, Journal of medicinal chemistry.

[24]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[25]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[26]  Andrey Kazennov,et al.  The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology , 2016, Oncotarget.

[27]  Marwin H. S. Segler,et al.  GuacaMol: Benchmarking Models for De Novo Molecular Design , 2018, J. Chem. Inf. Model..

[28]  Sepp Hochreiter,et al.  Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery , 2018, J. Chem. Inf. Model..

[29]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[30]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[31]  Regina Barzilay,et al.  Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network , 2017, NIPS.

[32]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[33]  Daniel D. Johnson,et al.  Learning Graphical State Transitions , 2016, ICLR.

[34]  Marwin H. S. Segler,et al.  Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. , 2017, Chemistry.

[35]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[36]  Matt J. Kusner,et al.  Learning a Generative Model for Validity in Complex Discrete Structures , 2017, ICLR.

[37]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[38]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[39]  Razvan Pascanu,et al.  Learning Deep Generative Models of Graphs , 2018, ICLR 2018.

[40]  Yoshua Bengio,et al.  DEFactor: Differentiable Edge Factorization-based Probabilistic Graph Generation , 2018, ArXiv.

[41]  Hugo Ceulemans,et al.  Large-scale comparison of machine learning methods for drug target prediction on ChEMBL , 2018, Chemical science.

[42]  Constantine Bekas,et al.  Molecular Transformer for Chemical Reaction Prediction and Uncertainty Estimation , 2018, ArXiv.

[43]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[44]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[45]  Daniel M. Lowe Extraction of chemical structures and reactions from the literature , 2012 .

[46]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.

[47]  Alexander A. Alemi,et al.  Fixing a Broken ELBO , 2017, ICML.

[48]  Zhengwei Peng,et al.  LEAP into the Pfizer Global Virtual Library (PGVL) space: creation of readily synthesizable design ideas automatically. , 2011, Methods in molecular biology.

[49]  Alán Aspuru-Guzik,et al.  Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models , 2018, Frontiers in Pharmacology.

[50]  Niloy Ganguly,et al.  NeVAE: A Deep Generative Model for Molecular Graphs , 2018, AAAI.

[51]  Ji-Bo Wang,et al.  The Proximal Lilly Collection: Mapping, Exploring and Exploiting Feasible Chemical Space , 2016, J. Chem. Inf. Model..

[52]  Steven Skiena,et al.  Syntax-Directed Variational Autoencoder for Structured Data , 2018, ICLR.

[53]  Qi Liu,et al.  Constrained Graph Variational Autoencoders for Molecule Design , 2018, NeurIPS.

[54]  Florent Chevillard,et al.  Virtual Compound Libraries in Computer-Assisted Drug Discovery , 2019, J. Chem. Inf. Model..

[55]  Pierre Baldi,et al.  Learning to Predict Chemical Reactions , 2011, J. Chem. Inf. Model..

[56]  Alán Aspuru-Guzik,et al.  What Is High-Throughput Virtual Screening? A Perspective from Organic Materials Discovery , 2015 .

[57]  Connor W. Coley,et al.  A graph-convolutional neural network model for the prediction of chemical reactivity , 2018, Chemical science.

[58]  Brian K. Shoichet,et al.  Virtual screening of chemical libraries , 2004, Nature.

[59]  G. Schneider,et al.  Enabling future drug discovery by de novo design , 2011 .