Predicting retrosynthetic pathways using a combined linguistic model and hyper-graph exploration strategy

We present an extension of our Molecular Transformer architecture combined with a hyper-graph exploration strategy for automatic retrosynthesis route planning without human intervention. The single-step retrosynthetic model sets a new state of the art for predicting reactants as well as reagents, solvents and catalysts for each retrosynthetic step. We introduce new metrics (coverage, class diversity, round-trip accuracy and Jensen-Shannon divergence) to evaluate the single-step retrosynthetic models, using the forward prediction and a reaction classification model always based on the transformer architecture. The hypergraph is constructed on the fly, and the nodes are filtered and further expanded based on a Bayesian-like probability. We critically assessed the end-to-end framework with several retrosynthesis examples from literature and academic exams. Overall, the frameworks has a very good performance with few weaknesses due to the bias induced during the training process. The use of the newly introduced metrics opens up the possibility to optimize entire retrosynthetic frameworks through focusing on the performance of the single-step model only.

[1]  Daniel M. Lowe,et al.  Big Data from Pharmaceutical Patents: A Computational Analysis of Medicinal Chemists' Bread and Butter. , 2016, Journal of medicinal chemistry.

[2]  Igor V. Tetko,et al.  A Transformer Model for Retrosynthesis , 2019, ICANN.

[3]  Daniel M. Lowe Extraction of chemical structures and reactions from the literature , 2012 .

[4]  Kangjie Lin,et al.  Automatic Retrosynthetic Pathway Planning Using Template-free Models , 2019, 1906.02308.

[5]  Pieter P. Plehiers,et al.  A robotic platform for flow synthesis of organic compounds informed by AI planning , 2019, Science.

[6]  Junzhou Huang,et al.  Molecular Graph Enhanced Transformer for Retrosynthesis Prediction , 2019, bioRxiv.

[7]  Regina Barzilay,et al.  Prediction of Organic Reaction Outcomes Using Machine Learning , 2017, ACS central science.

[8]  Akira Suzuki,et al.  Recent advances in the cross-coupling reactions of organoboron derivatives with organic electrophiles, 1995–1998 , 1999 .

[9]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[10]  Pierre Baldi,et al.  ReactionPredictor: Prediction of Complex Chemical Reactions at the Mechanistic Level Using Machine Learning , 2012, J. Chem. Inf. Model..

[11]  Ji-Bo Wang,et al.  A retrosynthetic analysis algorithm implementation , 2019, Journal of Cheminformatics.

[12]  Yang Liu,et al.  Route Designer: A Retrosynthetic Analysis Tool Utilizing Automated Retrosynthetic Rule Generation , 2009, J. Chem. Inf. Model..

[13]  Derek Lowe,et al.  AI designs organic syntheses , 2018, Nature.

[14]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[15]  Yuedong Yang,et al.  Predicting Retrosynthetic Reaction using Self-Corrected Transformer Neural Networks , 2019, ArXiv.

[16]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[17]  Thierry Kogej,et al.  Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc04944d , 2020, Chemical science.

[18]  Daniel Merkle,et al.  Finding the K best synthesis plans , 2018, Journal of Cheminformatics.

[19]  William H. Green,et al.  Computer-Assisted Retrosynthesis Based on Molecular Similarity , 2017, ACS central science.

[20]  Connor W. Coley,et al.  A graph-convolutional neural network model for the prediction of chemical reactivity , 2018, Chemical science.

[21]  Akihiro Kishimoto,et al.  Chemical Reactant Recommendation Using a Network of Organic Chemistry , 2017, RecSys.

[22]  Matthew H. Todd Computer-Aided Organic Synthesis , 2005 .

[23]  Hanna Cotton,et al.  Asymmetric synthesis of esomeprazole , 2000 .

[24]  Regina Barzilay,et al.  Learning to Make Generalizable and Diverse Predictions for Retrosynthesis , 2019, ArXiv.

[25]  Andrea Cadeddu,et al.  Organic chemistry as a language and the implications of chemical linguistics for structural and retrosynthetic analyses. , 2014, Angewandte Chemie.

[26]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[27]  William H. Green,et al.  Using Machine Learning To Predict Suitable Conditions for Organic Reactions , 2018, ACS central science.

[28]  Juno Nam,et al.  Linking the Neural Machine Translation and the Prediction of Organic Chemistry Reactions , 2016, ArXiv.

[29]  A. Filipa de Almeida,et al.  Synthetic organic chemistry driven by artificial intelligence , 2019, Nature Reviews Chemistry.

[30]  P. A. Worthington Synthesis and Fungicidal Activity of Triazole Tertiary Alcohols , 1988 .

[31]  Marwin H. S. Segler,et al.  Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. , 2017, Chemistry.

[32]  Noel M. O'Boyle Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI , 2012, Journal of Cheminformatics.

[33]  E. Corey,et al.  The Logic of Chemical Synthesis: Multistep Synthesis of Complex Carbogenic Molecules (Nobel Lecture)† , 1991 .

[34]  Connor W. Coley,et al.  SCScore: Synthetic Complexity Learned from a Reaction Corpus , 2018, J. Chem. Inf. Model..

[35]  Ling Wang,et al.  Retrosynthesis with Attention-Based NMT Model and Chemical Analysis of the "Wrong" Predictions , 2019, ArXiv.

[36]  Sen Song,et al.  Decomposing Retrosynthesis into Reactive Center Prediction and Molecule Generation , 2019, bioRxiv.

[37]  Alpha A Lee,et al.  Molecular Transformer unifies reaction prediction and retrosynthesis across pharma chemical space. , 2019, Chemical communications.

[38]  Xiang Liu,et al.  On Ni catalysts for catalytic, asymmetric Ni/Cr-mediated coupling reactions. , 2012, Journal of the American Chemical Society.

[39]  Piotr Dittwald,et al.  Efficient Syntheses of Diverse, Medicinally Relevant Targets Planned by Computer and Executed in the Laboratory , 2018 .

[40]  Constantine Bekas,et al.  “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models† †Electronic supplementary information (ESI) available: Time-split test set and example predictions, together with attention weights, confidence and token probabilities. See DO , 2017, Chemical science.

[41]  Stephen R. Heller,et al.  InChI, the IUPAC International Chemical Identifier , 2015, Journal of Cheminformatics.

[42]  B. Grzybowski,et al.  The 'wired' universe of organic chemistry. , 2009, Nature chemistry.

[43]  A F Crowther,et al.  Beta-adrenergic blocking agents. II. Propranolol and related 3-amino-1-naphthoxy-2-propanols. , 1968, Journal of medicinal chemistry.

[44]  Connor W. Coley,et al.  Machine Learning in Computer-Aided Synthesis Planning. , 2018, Accounts of chemical research.

[45]  John S. Schreck,et al.  Learning Retrosynthetic Planning through Simulated Experience , 2019, ACS central science.

[46]  Piotr Dittwald,et al.  Computer-Assisted Synthetic Planning: The End of the Beginning. , 2016, Angewandte Chemie.

[47]  Kimito Funatsu,et al.  SOPHIA, a Knowledge Base-Guided Reaction Prediction System - Utilization of a Knowledge Base Derived from a Reaction Database , 1995, J. Chem. Inf. Comput. Sci..

[48]  Luhua Lai,et al.  Computational Chemical Synthesis Analysis and Pathway Design , 2018, Front. Chem..

[49]  Marwin H. S. Segler,et al.  Modelling Chemical Reasoning to Predict Reactions , 2016, Chemistry.

[50]  Mikhail Soutchanski,et al.  Organic Synthesis as Artificial Intelligence Planning , 2013, SWAT4LS.

[51]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching , 2017, Journal of Cheminformatics.

[52]  Gregory A Landrum,et al.  What's What: The (Nearly) Definitive Guide to Reaction Role Assignment , 2016, J. Chem. Inf. Model..