RetCL: A Selection-based Approach for Retrosynthesis via Contrastive Learning

Retrosynthesis, of which the goal is to find a set of reactants for synthesizing a target product, is an emerging research area of deep learning. While the existing approaches have shown promising results, they currently lack the ability to consider availability (e.g., stability or purchasability) of the reactants or generalize to unseen reaction templates (i.e., chemical reaction rules). In this paper, we propose a new approach that mitigates the issues by reformulating retrosynthesis into a selection problem of reactants from a candidate set of commercially available molecules. To this end, we design an efficient reactant selection framework, named RETCL (retrosynthesis via contrastive learning), for enumerating all of the candidate molecules based on selection scores computed by graph neural networks. For learning the score functions, we also propose a novel contrastive training scheme with hard negative mining. Extensive experiments demonstrate the benefits of the proposed selectionbased approach. For example, when all 671k reactants in the USPTO database are given as candidates, our RETCL achieves top-1 exact match accuracy of 71.3% for the USPTO-50k benchmark, while a recent transformer-based approach achieves 59.6%. We also demonstrate that RETCL generalizes well to unseen templates in various settings in contrast to template-based approaches.1

[1]  Jun Xu,et al.  Predicting Retrosynthetic Reactions Using Self-Corrected Transformer Neural Networks , 2019, J. Chem. Inf. Model..

[2]  Christopher A. Hunter,et al.  Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction , 2018, ACS central science.

[3]  Yuedong Yang,et al.  Predicting Retrosynthetic Reaction using Self-Corrected Transformer Neural Networks , 2019, ArXiv.

[4]  Zoltán Novák,et al.  Transition-Metal-Free N-Arylation of Pyrazoles with Diaryliodonium Salts. , 2015, Chemistry.

[5]  Regina Barzilay,et al.  Prediction of Organic Reaction Outcomes Using Machine Learning , 2017, ACS central science.

[6]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[7]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[8]  Le Song,et al.  Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search , 2020, ICML.

[9]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[10]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jian Tang,et al.  A Graph to Graphs Framework for Retrosynthesis Prediction , 2020, ICML.

[12]  Bowen Liu,et al.  Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models , 2017, ACS central science.

[13]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[14]  Klavs F Jensen,et al.  RDChiral: An RDKit Wrapper for Handling Stereochemistry in Retrosynthetic Template Extraction and Application , 2019, J. Chem. Inf. Model..

[15]  E. Corey,et al.  Computer-assisted analysis in organic synthesis. , 1985, Science.

[16]  Kangjie Lin,et al.  Automatic Retrosynthetic Pathway Planning Using Template-free Models , 2019, 1906.02308.

[17]  Piotr Dittwald,et al.  Computer-Assisted Synthetic Planning: The End of the Beginning. , 2016, Angewandte Chemie.

[18]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[19]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[20]  Gustavo Carneiro,et al.  Smart Mining for Deep Metric Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[22]  Akihiro Kishimoto,et al.  Depth-First Proof-Number Search with Heuristic Edge Cost and Application to Chemical Synthesis Planning , 2019, NeurIPS.

[23]  Daniel M. Lowe Extraction of chemical structures and reactions from the literature , 2012 .

[24]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[25]  E. Corey,et al.  The Logic of Chemical Synthesis: Multistep Synthesis of Complex Carbogenic Molecules (Nobel Lecture)† , 1991 .

[26]  Le Song,et al.  Retrosynthesis Prediction with Conditional Graph Logic Network , 2020, NeurIPS.

[27]  Alex Smola,et al.  Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.

[28]  Gregory A Landrum,et al.  What's What: The (Nearly) Definitive Guide to Reaction Role Assignment , 2016, J. Chem. Inf. Model..

[29]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[30]  William H. Green,et al.  Computer-Assisted Retrosynthesis Based on Molecular Similarity , 2017, ACS central science.

[31]  Regina Barzilay,et al.  Learning to Make Generalizable and Diverse Predictions for Retrosynthesis , 2019, ArXiv.

[32]  Marwin H. S. Segler,et al.  Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. , 2017, Chemistry.

[33]  Le Song,et al.  Discriminative Embeddings of Latent Variable Models for Structured Data , 2016, ICML.

[34]  Stephen Wu,et al.  A Bayesian algorithm for retrosynthesis , 2020, J. Chem. Inf. Model..

[35]  Igor V. Tetko,et al.  A Transformer Model for Retrosynthesis , 2019, ICANN.

[36]  Troy Mutton,et al.  Understanding Similarities and Differences between Two Prominent Web-Based Chemical Information and Data Retrieval Tools: Comments on Searches for Research Topics, Substances, and Reactions , 2019, Journal of Chemical Education.

[37]  Regina Barzilay,et al.  Learning Graph Models for Template-Free Retrosynthesis , 2020, ArXiv.