Unsupervised Attention-Guided Atom-Mapping

Knowing how atoms rearrange during a chemical transformation is fundamental to numerous applications aiming to accelerate organic synthesis and molecular discovery. This labelling is known as atom-mapping and is an NP-hard problem. Current solutions use a combination of graph-theoretical approaches, heuristics, and rule-based systems. Unfortunately, the existing mappings and algorithms are often prone to errors and quality issues, which limit the effectiveness of supervised approaches. Self-supervised neural networks called Transformers, on the other hand, have recently shown tremendous potential when applied to textual representations of different domain-specific data, such as chemical reactions. Here we demonstrate that attention weights learned by a Transformer, without supervision or human labelling, encode atom rearrangement information between products and reactants. We build a chemically agnostic attention-guided reaction mapper that shows a remarkable performance in terms of accuracy and speed, even for strongly imbalanced reactions. Our work suggests that unannotated collections of chemical reactions contain all the relevant information to construct coherent sets of reaction rules. This finding provides the missing link between data-driven and rule-based approaches and will stimulate machine-assisted discovery in the chemical domain.Code is available at: https://github.com/rxn4chemistry/rxnmapper

[1]  Kimito Funatsu,et al.  Automatic recognition of reaction site in organic chemical reactions , 1988 .

[2]  Zhen Yang,et al.  Stereoselective construction of an unprecedented 7-8 fused ring system in micrandilactone a by [3,3]-sigmatropic rearrangement. , 2008, Organic letters.

[3]  Joannis Apostolakis,et al.  Automatic Determination of Reaction Mappings and Reaction Center Information. 1. The Imaginary Transition State Energy Approach , 2008, J. Chem. Inf. Model..

[4]  Jesse Vig,et al.  A Multiscale Visualization of Attention in the Transformer Model , 2019, ACL.

[5]  Ronan M. T. Fleming,et al.  Comparative evaluation of atom mapping algorithms for balanced metabolic reactions: application to Recon 3D , 2017, Journal of Cheminformatics.

[6]  Arturo Orellana,et al.  Synthesis of benzodiquinanes via tandem palladium-catalyzed semipinacol rearrangement and direct arylation. , 2011, Organic Letters.

[7]  Riccardo Petraglia,et al.  Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy† , 2020, Chemical science.

[8]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Roger A. Sayle,et al.  Get Your Atoms in Order - An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm , 2015, J. Chem. Inf. Model..

[11]  Daniel M. Lowe Extraction of chemical structures and reactions from the literature , 2012 .

[12]  Sebastian Gehrmann,et al.  exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformers Models , 2019, ArXiv.

[13]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[14]  J. Gasteiger,et al.  The Principle of Minimum Chemical Distance (PMCD) , 1980 .

[15]  A. Filipa de Almeida,et al.  Synthetic organic chemistry driven by artificial intelligence , 2019, Nature Reviews Chemistry.

[16]  Sean Wallis,et al.  Binomial Confidence Intervals and Contingency Tests: Mathematical Fundamentals and the Evaluation of Alternative Methods , 2013, J. Quant. Linguistics.

[17]  S. Nanda,et al.  Exploration of Ring Rearrangement Metathesis Reaction: A General and Flexible Approach for the Rapid Construction [5,n]-Fused Bicyclic Systems en Route to Linear Triquinanes. , 2018, The Journal of organic chemistry.

[18]  Marwin H. S. Segler,et al.  Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. , 2017, Chemistry.

[19]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[20]  Gemma L. Holliday,et al.  EC-BLAST: A Tool to Automatically Search and Compare Enzyme Reactions , 2014, Nature Methods.

[21]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[22]  A. Basso,et al.  Ugi and Passerini reactions of biocatalytically derived chiral aldehydes: application to the synthesis of bicyclic pyrrolidines and of antiviral agent telaprevir. , 2015, The Journal of organic chemistry.

[23]  Jean-Louis Reymond,et al.  Visualization of very large high-dimensional data sets as minimum spanning trees , 2019, Journal of Cheminformatics.

[24]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[25]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[26]  Johann Gasteiger,et al.  Automatic Determination of Reaction Mappings and Reaction Center Information. 2. Validation on a Biochemical Reaction Database , 2008, J. Chem. Inf. Model..

[27]  Yonatan Belinkov,et al.  Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.

[28]  William Lingran Chen,et al.  Over 20 Years of Reaction Access Systems from MDL: A Novel Reaction Substructure Search Algorithm , 2002, J. Chem. Inf. Comput. Sci..

[29]  Pierre Baldi,et al.  ReactionMap: An Efficient Atom-Mapping Algorithm for Chemical Reactions , 2013, J. Chem. Inf. Model..

[30]  Regina Barzilay,et al.  Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network , 2017, NIPS.

[31]  Anna Gambin,et al.  Automatic mapping of atoms across both simple and complex chemical reactions , 2019, Nature Communications.

[32]  J. Reymond,et al.  Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc04944d , 2019, Chemical science.

[33]  Peter Willett,et al.  Use of a maximum common subgraph algorithm in the automatic identification of ostensible bond changes occurring in chemical reactions , 1981, J. Chem. Inf. Comput. Sci..

[34]  Arzucan Özgür,et al.  Exploring Chemical Space using Natural Language Processing Methodologies for Drug Discovery , 2020, Drug discovery today.

[35]  Tatsuya Akutsu,et al.  Efficient extraction of mapping rules of atoms from enzymatic reaction data , 2003, RECOMB '03.

[36]  Connor W. Coley,et al.  A graph-convolutional neural network model for the prediction of chemical reactivity , 2018, Chemical science.

[37]  Peter D. Karp,et al.  Accurate Atom-Mapping Computation for Biochemical Reactions , 2012, J. Chem. Inf. Model..

[38]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[39]  Christodoulos A. Floudas,et al.  Stereochemically Consistent Reaction Mapping and Identification of Multiple Reaction Mechanisms through Integer Linear Optimization , 2012, J. Chem. Inf. Model..

[40]  Brian C. Barnes,et al.  Data Augmentation and Pretraining for Template-Based Retrosynthetic Prediction in Computer-Aided Synthesis Planning , 2020, J. Chem. Inf. Model..

[41]  Christopher A. Hunter,et al.  Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction , 2018, ACS central science.

[42]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[43]  Yefeng Tang,et al.  Diastereoselective total synthesis of (±)-schindilactone A, Part 1: Construction of the ABC and FGH ring systems and initial attempts to construct the CDEF ring system. , 2012, Chemistry, an Asian journal.

[44]  Michael F. Lynch,et al.  The Automatic Detection of Chemical Reaction Sites , 1978, J. Chem. Inf. Comput. Sci..

[45]  Gregory A Landrum,et al.  What's What: The (Nearly) Definitive Guide to Reaction Role Assignment , 2016, J. Chem. Inf. Model..

[46]  Alain C. Vaucher,et al.  Data-Driven Chemical Reaction Classification, Fingerprinting and Clustering using Attention-Based Neural Networks , 2019 .

[47]  Philippe Schwaller,et al.  Data-Driven Learning Systems for Chemical Reaction Prediction: An Analysis of Recent Approaches , 2019, ACS Symposium Series.

[48]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[49]  Regina Barzilay,et al.  Prediction of Organic Reaction Outcomes Using Machine Learning , 2017, ACS central science.

[50]  Dinesh P. Mehta,et al.  Automated reaction mapping , 2009, JEAL.