The Synthesizability of Molecules Proposed by Generative Models

The discovery of functional molecules is an expensive and time-consuming process, exemplified by the rising costs of small molecule therapeutic discovery. One class of techniques of growing interest for early-stage drug discovery is de novo molecular generation and optimization, catalyzed by the development of new deep learning approaches. These techniques can suggest novel molecular structures intended to maximize a multi-objective function, e.g., suitability as a therapeutic against a particular target, without relying on brute-force exploration of a chemical space. However, the utility of these approaches is stymied by ignorance of synthesizability. To highlight the severity of this issue, we use a data-driven computer-aided synthesis planning program to quantify how often molecules proposed by state-of-the-art generative models cannot be readily synthesized. Our analysis demonstrates that there are several tasks for which these models generate unrealistic molecular structures despite performing well on popular quantitative benchmarks. Synthetic complexity heuristics can successfully bias generation toward synthetically-tractable chemical space, although doing so necessarily detracts from the primary objective. This analysis suggests that to improve the utility of these models in real discovery workflows, new algorithm development is warranted.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Constantine Bekas,et al.  Molecular Transformer for Chemical Reaction Prediction and Uncertainty Estimation , 2018, ArXiv.

[3]  Li Li,et al.  Optimization of Molecules via Deep Reinforcement Learning , 2018, Scientific Reports.

[4]  Lin-Li Li,et al.  RASA: A Rapid Retrosynthesis-Based Scoring Method for the Assessment of Synthetic Accessibility of Drug-like Molecules , 2011, J. Chem. Inf. Model..

[5]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[6]  Steven H. Bertz,et al.  The first general index of molecular complexity , 1981 .

[7]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[8]  Marwin H. S. Segler,et al.  GuacaMol: Benchmarking Models for De Novo Molecular Design , 2018, J. Chem. Inf. Model..

[9]  Koji Tsuda,et al.  Population-based de novo molecule generation, using grammatical evolution , 2018, 1804.02134.

[10]  Alán Aspuru-Guzik,et al.  Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models , 2018, Frontiers in Pharmacology.

[11]  Luhua Lai,et al.  Computational Chemical Synthesis Analysis and Pathway Design , 2018, Front. Chem..

[12]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[13]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[14]  Pascal Bonnet,et al.  Is chemical synthetic accessibility computationally predictable for drug and lead-like molecules? A comparative assessment between medicinal and computational chemists. , 2012, European journal of medicinal chemistry.

[15]  Yurii S. Moroz,et al.  Ultra-large library docking for discovering new chemotypes , 2019, Nature.

[16]  Pieter P. Plehiers,et al.  A robotic platform for flow synthesis of organic compounds informed by AI planning , 2019, Science.

[17]  Dmitry Vetrov,et al.  Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery. , 2018, Molecular pharmaceutics.

[18]  Matt J. Kusner,et al.  A Model to Search for Synthesizable Molecules , 2019, NeurIPS.

[19]  George Karypis,et al.  Assessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods , 2010, J. Chem. Inf. Model..

[20]  Yee Whye Teh,et al.  Meta-learning of Sequential Strategies , 2019, ArXiv.

[21]  Kirthevasan Kandasamy,et al.  ChemBO: Bayesian Optimization of Small Organic Molecules with Synthesizable Recommendations , 2019, AISTATS.

[22]  Anabella Villalobos,et al.  Central Nervous System Multiparameter Optimization Desirability: Application in Drug Discovery. , 2016, ACS chemical neuroscience.

[23]  Yutaka Endo,et al.  Development of a Method for Evaluating Drug-Likeness and Ease of Synthesis Using a Data Set in Which Compounds Are Assigned Scores Based on Chemists' Intuition , 2003, J. Chem. Inf. Comput. Sci..

[24]  Esben Jannik Bjerrum,et al.  Molecular Generation with Recurrent Neural Networks (RNNs) , 2017, ArXiv.

[25]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[26]  K. Tsuda,et al.  Hunting for Organic Molecules with Artificial Intelligence: Molecules Optimized for Desired Excitation Energies , 2018, ACS central science.

[27]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[28]  Jun Li,et al.  Current complexity: a tool for assessing the complexity of organic molecules. , 2015, Organic & biomolecular chemistry.

[29]  Robert P. Sheridan,et al.  Modeling a Crowdsourced Definition of Molecular Complexity , 2014, J. Chem. Inf. Model..

[30]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[31]  Connor W. Coley,et al.  SCScore: Synthetic Complexity Learned from a Reaction Corpus , 2018, J. Chem. Inf. Model..

[32]  Alán Aspuru-Guzik,et al.  Deep learning enables rapid identification of potent DDR1 kinase inhibitors , 2019, Nature Biotechnology.

[33]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[34]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[35]  Jan H Jensen A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space† †Electronic supplementary information (ESI) available: The codes used in this study can be found on GitHub: github.com/jensengroup/GB-GA/tree/v0.0 and github.com/jensengroup/GB-GM/tree , 2019, Chemical science.

[36]  Michael S Lajiness,et al.  Assessment of the consistency of medicinal chemists in reviewing sets of compounds. , 2004, Journal of medicinal chemistry.

[37]  Hisashi Kashima,et al.  Wisdom of crowds for synthetic accessibility evaluation. , 2018, Journal of molecular graphics & modelling.

[38]  Connor W. Coley,et al.  A graph-convolutional neural network model for the prediction of chemical reactivity , 2018, Chemical science.

[39]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.