Planning chemical syntheses with deep neural networks and symbolic AI

To plan the syntheses of small organic molecules, chemists use retrosynthesis, a problem-solving technique in which target molecules are recursively transformed into increasingly simpler precursors. Computer-aided retrosynthesis would be a valuable tool but at present it is slow and provides results of unsatisfactory quality. Here we use Monte Carlo tree search and symbolic artificial intelligence (AI) to discover retrosynthetic routes. We combined Monte Carlo tree search with an expansion policy network that guides the search, and a filter network to pre-select the most promising retrosynthetic steps. These deep neural networks were trained on essentially all reactions ever published in organic chemistry. Our system solves for almost twice as many molecules, thirty times faster than the traditional computer-aided search method, which is based on extracted rules and hand-designed heuristics. In a double-blind AB test, chemists on average considered our computer-generated routes to be equivalent to reported literature routes.

[1]  Frank Neese,et al.  The ORCA program system , 2012 .

[2]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[3]  Miguel A. Sierra,et al.  Dead Ends and Detours En Route to Total Syntheses of the 1990s , 2000 .

[4]  G. Schneider,et al.  Enabling future drug discovery by de novo design , 2011 .

[5]  Marwin H. S. Segler,et al.  Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. , 2017, Chemistry.

[6]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[7]  Kimito Funatsu,et al.  SOPHIA, a Knowledge Base-Guided Reaction Prediction System - Utilization of a Knowledge Base Derived from a Reaction Database , 1995, J. Chem. Inf. Comput. Sci..

[8]  Robert P. Sheridan,et al.  Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction , 2013, J. Chem. Inf. Model..

[9]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[10]  Alexandre Varnek,et al.  Structure–reactivity modeling using mixture-based representation of chemical reactions , 2017, Journal of Computer-Aided Molecular Design.

[11]  Amos J. Storkey,et al.  Training Deep Convolutional Neural Networks to Play Go , 2015, ICML.

[12]  Robert Robinson,et al.  LXIII.—A synthesis of tropinone , 1917 .

[13]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[14]  Alán Aspuru-Guzik,et al.  Neural Networks for the Prediction of Organic Chemistry Reactions , 2016, ACS central science.

[15]  Daniel M. Lowe,et al.  Development of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and Similarity , 2015, J. Chem. Inf. Model..

[16]  Daniel M. Lowe,et al.  Corrections to "Development of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and Similarity" , 2015, J. Chem. Inf. Model..

[17]  Darko Butina,et al.  Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets , 1999, J. Chem. Inf. Comput. Sci..

[18]  Alexandre Varnek,et al.  Automatized Assessment of Protective Group Reactivity: A Step Toward Big Reaction Data Analysis , 2016, J. Chem. Inf. Model..

[19]  Qian Peng,et al.  Computing organic stereoselectivity - from concepts to quantitative calculations and predictions. , 2016, Chemical Society reviews.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  João Aires-de-Sousa,et al.  Machine learning of chemical reactivity from databases of organic reactions , 2009, J. Comput. Aided Mol. Des..

[22]  Johann Gasteiger,et al.  Structure and reaction based evaluation of synthetic accessibility , 2007, J. Comput. Aided Mol. Des..

[23]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[24]  Christopher D. Rosin,et al.  Multi-armed bandits with episode context , 2011, Annals of Mathematics and Artificial Intelligence.

[25]  Marwin H. S. Segler,et al.  Modelling Chemical Reasoning to Predict Reactions , 2016, Chemistry.

[26]  Tim Rocktäschel,et al.  End-to-end Differentiable Proving , 2017, NIPS.

[27]  G. É. Vléduts,et al.  Concerning one system of classification and codification of organic reactions , 1963, Inf. Storage Retr..

[28]  Petra Schneider,et al.  De Novo Design at the Edge of Chaos. , 2016, Journal of medicinal chemistry.

[29]  Cezary Kaliszyk,et al.  Monte Carlo Connection Prover , 2016, ArXiv.

[30]  Chyouhwa Chen,et al.  Building and refining a knowledge base for synthetic organic chemistry via the methodology of inductive and deductive machine learning , 1990, J. Chem. Inf. Comput. Sci..

[31]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[32]  Marwin H. S. Segler,et al.  Dehydrogenative TEMPO-Mediated Formation of Unstable Nitrones: Easy Access to N-Carbamoyl Isoxazolines. , 2015, Chemistry.

[33]  Ramakrishna Nirogi,et al.  Design, Synthesis and Biological Evaluation of Novel Benzopyran Sulfonamide Derivatives as 5-HT6 Receptor Ligands , 2015 .

[34]  Frank Glorius,et al.  A robustness screen for the rapid assessment of chemical reactions , 2013, Nature Chemistry.

[35]  Yang Liu,et al.  Route Designer: A Retrosynthetic Analysis Tool Utilizing Automated Retrosynthetic Rule Generation , 2009, J. Chem. Inf. Model..

[36]  Mark H. M. Winands,et al.  Monte-Carlo Tree Search Solver , 2008, Computers and Games.

[37]  Daniel Merkle,et al.  Generic Strategies for Chemical Space Exploration , 2013, Int. J. Comput. Biol. Drug Des..

[38]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[39]  Cezary Kaliszyk,et al.  Monte Carlo Tableau Proof Search , 2017, CADE.

[40]  Qing-You Zhang,et al.  Structure-Based Classification of Chemical Reactions without Assignment of Reaction Centers , 2005, J. Chem. Inf. Model..

[41]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[42]  Piotr Dittwald,et al.  Computer-Assisted Synthetic Planning: The End of the Beginning. , 2016, Angewandte Chemie.

[43]  Michael G. Hutchings,et al.  Route Design in the 21st Century: The ICSYNTH Software Tool as an Idea Generator for Synthesis Prediction , 2015 .

[44]  Lars Ruddigkeit,et al.  The enumeration of chemical space , 2012 .

[45]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[46]  K. C. Nicolaou,et al.  Strategic applications of named reactions in organic synthesis: background and detailed mechanisms , 2005 .

[47]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[48]  Regina Barzilay,et al.  Prediction of Organic Reaction Outcomes Using Machine Learning , 2017, ACS central science.

[49]  T. Huynh-Dinh,et al.  The logic of chemical synthesis , 1996 .

[50]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[51]  Kevin Warwick,et al.  March of the Machines , 1997 .

[52]  K. Holyoak,et al.  The Oxford handbook of thinking and reasoning , 2012 .

[53]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[54]  Catherine Caillet,et al.  Discovery and structural diversity of the hepatitis C virus NS3/4A serine protease inhibitor series leading to clinical candidate IDX320. , 2015, Bioorganic & medicinal chemistry letters.

[55]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[56]  Rémi Coulom,et al.  Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..

[57]  Andreas Dietz,et al.  Models, concepts, theories, and formal languages in chemistry and their use as a basis for computer assistance in chemistry , 1994, Journal of chemical information and computer sciences.

[58]  Anthony P. F. Cook,et al.  Computer‐aided synthesis design: 40 years on , 2012 .

[59]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[60]  Pierre Baldi,et al.  Learning to Predict Chemical Reactions , 2011, J. Chem. Inf. Model..

[61]  William H. Green,et al.  Computer-Assisted Retrosynthesis Based on Molecular Similarity , 2017, ACS central science.

[62]  Valerie J. Gillet,et al.  Knowledge-Based Approach to de Novo Design Using Reaction Vectors , 2009, J. Chem. Inf. Model..

[63]  Richard J Ingham,et al.  Organic synthesis: march of the machines. , 2015, Angewandte Chemie.

[64]  Mark H. M. Winands,et al.  Neural Networks for Video Game AI , 2015 .

[65]  Christian Templin,et al.  40 Years on , 2017, European heart journal.

[66]  Johann Gasteiger,et al.  Computer‐Assisted Planning of Organic Syntheses: The Second Generation of Programs , 1996 .

[67]  David Silver,et al.  Move Evaluation in Go Using Deep Convolutional Neural Networks , 2014, ICLR.

[68]  Thore Graepel,et al.  Bayesian pattern ranking for move prediction in the game of Go , 2006, ICML.

[69]  Jonathan M. Goodman,et al.  The ROBIA Program for Predicting Organic Reactivity , 2006, Journal of Chemical Information and Modeling.

[70]  Matthew H Todd,et al.  Computer-aided organic synthesis. , 2005, Chemical Society reviews.

[71]  Johann Gasteiger,et al.  HORACE: An automatic system for the hierarchical classification of chemical reactions , 1994, Journal of chemical information and computer sciences.

[72]  Bowen Liu,et al.  Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models , 2017, ACS central science.

[73]  Dragos Horvath,et al.  Expert System for Predicting Reaction Conditions: The Michael Reaction Case , 2015, J. Chem. Inf. Model..

[74]  Thierry Kogej,et al.  Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ArXiv.

[75]  Robert Neilson Boyd,et al.  Organic Chemistry 2nd Ed. , 1956 .

[76]  Clara D. Christ,et al.  Mining Electronic Laboratory Notebooks: Analysis, Retrosynthesis, and Reaction Based Enumeration , 2012, J. Chem. Inf. Model..

[77]  S. Segawa,et al.  End of the beginning , 1990, Nature.