Computer-Assisted Retrosynthesis Based on Molecular Similarity

We demonstrate molecular similarity to be a surprisingly effective metric for proposing and ranking one-step retrosynthetic disconnections based on analogy to precedent reactions. The developed approach mimics the retrosynthetic strategy defined implicitly by a corpus of known reactions without the need to encode any chemical knowledge. Using 40 000 reactions from the patent literature as a knowledge base, the recorded reactants are among the top 10 proposed precursors in 74.1% of 5000 test reactions, providing strong quantitative support for our methodology. Extension of the one-step strategy to multistep pathway planning is demonstrated and discussed for two exemplary drug products.

[1]  Alán Aspuru-Guzik,et al.  Neural Networks for the Prediction of Organic Chemistry Reactions , 2016, ACS central science.

[2]  Johann Gasteiger,et al.  A Collection of Computer Methods for Synthesis Design and Reaction Prediction , 2010 .

[3]  Darko Butina,et al.  Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets , 1999, J. Chem. Inf. Comput. Sci..

[4]  Marwin H. S. Segler,et al.  Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. , 2017, Chemistry.

[5]  Alexander J. Lawson,et al.  The Making of Reaxys—Towards Unobstructed Access to Relevant Chemistry Information , 2014 .

[6]  Clara D. Christ,et al.  Mining Electronic Laboratory Notebooks: Analysis, Retrosynthesis, and Reaction Based Enumeration , 2012, J. Chem. Inf. Model..

[7]  R. Parr,et al.  Information theory, atoms in molecules, and molecular similarity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Martin A. Ott,et al.  Computer tools for reaction retrieval and synthesis planning in organic chemistry. A brief review of their history, methods, and programs , 1992 .

[9]  Chyouhwa Chen,et al.  Building and refining a knowledge base for synthetic organic chemistry via the methodology of inductive and deductive machine learning , 1990, J. Chem. Inf. Comput. Sci..

[10]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[11]  Yang Liu,et al.  Route Designer: A Retrosynthetic Analysis Tool Utilizing Automated Retrosynthetic Rule Generation , 2009, J. Chem. Inf. Model..

[12]  Piotr Dittwald,et al.  Computer-Assisted Synthetic Planning: The End of the Beginning. , 2016, Angewandte Chemie.

[13]  Michael G. Hutchings,et al.  Route Design in the 21st Century: The ICSYNTH Software Tool as an Idea Generator for Synthesis Prediction , 2015 .

[14]  A. Tversky Features of Similarity , 1977 .

[15]  Adrià Cereto-Massagué,et al.  Molecular fingerprint similarity search in virtual screening. , 2015, Methods.

[16]  John R. Proudfoot,et al.  Molecular Complexity and Retrosynthesis. , 2017, The Journal of organic chemistry.

[17]  Wendy A Warr,et al.  A Short Review of Chemical Reaction Database Systems, Computer‐Aided Synthesis Design, Reaction Prediction and Synthetic Feasibility , 2014, Molecular informatics.

[18]  Gobbi,et al.  Genetic optimization of combinatorial libraries , 1998, Biotechnology and bioengineering.

[19]  Kimito Funatsu,et al.  SOPHIA, a Knowledge Base-Guided Reaction Prediction System - Utilization of a Knowledge Base Derived from a Reaction Database , 1995, J. Chem. Inf. Comput. Sci..

[20]  Matthew H Todd,et al.  Computer-aided organic synthesis. , 2005, Chemical Society reviews.

[21]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[22]  Gilla Kaplan,et al.  Amino-substituted thalidomide analogs: Potent inhibitors of TNF-α production , 1999 .

[23]  Robert Robinson,et al.  LXIII.—A synthesis of tropinone , 1917 .

[24]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[25]  Ragnar Stare,et al.  Enantioselective Synthesis of Salmeterol via Asymmetric Borane Reduction. , 1994 .

[26]  Robert P. Sheridan,et al.  Modeling a Crowdsourced Definition of Molecular Complexity , 2014, J. Chem. Inf. Model..

[27]  Pierre Baldi,et al.  When is Chemical Similarity Significant? The Statistical Distribution of Chemical Similarity Scores and Its Extreme Values , 2010, J. Chem. Inf. Model..

[28]  Bowen Liu,et al.  Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models , 2017, ACS central science.

[29]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[30]  W. L. Jorgensen,et al.  CAMEO: a program for the logical prediction of the products of organic reactions , 1990 .

[31]  Gregory A Landrum,et al.  What's What: The (Nearly) Definitive Guide to Reaction Role Assignment , 2016, J. Chem. Inf. Model..

[32]  Károly Héberger,et al.  Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? , 2015, Journal of Cheminformatics.

[33]  E. Corey,et al.  Computer-assisted analysis in organic synthesis. , 1985, Science.

[34]  Sereina Riniker,et al.  Open-source platform to benchmark fingerprints for ligand-based virtual screening , 2013, Journal of Cheminformatics.

[35]  Regina Barzilay,et al.  Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction , 2017, J. Chem. Inf. Model..

[36]  Johann Gasteiger,et al.  COMPUTER-ASSISTED DESIGN OF SYNTHESES FOR HETEROCYCLIC COMPOUNDS , 1995 .

[37]  Yuri Ponomaryov,et al.  Scalable and green process for the synthesis of anticancer drug lenalidomide , 2015, Chemistry of Heterocyclic Compounds.

[38]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[39]  Marwin H. S. Segler,et al.  Modelling Chemical Reasoning to Predict Reactions , 2016, Chemistry.

[40]  Kimito Funatsu,et al.  A Novel Approach to Retrosynthetic Analysis Using Knowledge Bases Derived from Reaction Databases , 1999, J. Chem. Inf. Comput. Sci..

[41]  Anthony P. F. Cook,et al.  Computer‐aided synthesis design: 40 years on , 2012 .

[42]  E. Corey,et al.  The Logic of Chemical Synthesis: Multistep Synthesis of Complex Carbogenic Molecules (Nobel Lecture)† , 1991 .

[43]  Woody Sherman,et al.  Analysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods , 2010, J. Cheminformatics.

[44]  Matthias Rarey,et al.  Feature trees: A new molecular similarity measure based on tree matching , 1998, J. Comput. Aided Mol. Des..

[45]  Mike Preuss,et al.  Towards "AlphaChem": Chemical Synthesis Planning with Tree Search and Deep Neural Network Policies , 2017, ICLR.

[46]  Pierre Baldi,et al.  Lossless Compression of Chemical Fingerprints Using Integer Entropy Codes Improves Storage and Retrieval , 2007, J. Chem. Inf. Model..

[47]  Regina Barzilay,et al.  Prediction of Organic Reaction Outcomes Using Machine Learning , 2017, ACS central science.

[48]  Andrea Cadeddu,et al.  Organic chemistry as a language and the implications of chemical linguistics for structural and retrosynthetic analyses. , 2014, Angewandte Chemie.

[49]  Johann Gasteiger,et al.  Similarity concepts for the planning of organic reactions and syntheses , 1992, J. Chem. Inf. Comput. Sci..