Data Transfer Approaches to Improve Seq-to-Seq Retrosynthesis

Retrosynthesis is a problem to infer reactant compounds to synthesize a given product compound through chemical reactions. Recent studies on retrosynthesis focus on proposing more sophisticated prediction models, but the dataset to feed the models also plays an essential role in achieving the best generalizing models. Generally, a dataset that is best suited for a specific task tends to be small. In such a case, it is the standard solution to transfer knowledge from a large or clean dataset in the same domain. In this paper, we conduct a systematic and intensive examination of data transfer approaches on end-to-end generative models, in application to retrosynthesis. Experimental results show that typical data transfer methods can improve test prediction scores of an off-the-shelf Transformer baseline model. Especially, the pre-training plus fine-tuning approach boosts the accuracy scores of the baseline, achieving the new state-of-the-art. In addition, we conduct a manual inspection for the erroneous prediction results. The inspection shows that the pre-training plus fine-tuning models can generate chemically appropriate or sensible proposals in almost all cases.

[1]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[3]  Fu Xiong,et al.  Towards Good Practices on Building Effective CNN Baseline Model for Person Re-identification , 2018, ArXiv.

[4]  E J Corey,et al.  Computer-assisted design of complex organic syntheses. , 1969, Science.

[5]  Aleksandra Mojsilovic,et al.  CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models , 2020, NeurIPS.

[6]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7]  Quoc V. Le,et al.  Rethinking Pre-training and Self-training , 2020, NeurIPS.

[8]  Jean-Louis Reymond,et al.  Transfer learning enables the molecular transformer to predict regio- and stereoselective reactions on carbohydrates , 2020, Nature Communications.

[9]  Daniel M. Lowe Extraction of chemical structures and reactions from the literature , 2012 .

[10]  R. McGibbon,et al.  Discovering chemistry with an ab initio nanoreactor , 2014, Nature chemistry.

[11]  Le Song,et al.  Retrosynthesis Prediction with Conditional Graph Logic Network , 2020, NeurIPS.

[12]  Yang Yu,et al.  RetroXpert: Decompose Retrosynthesis Prediction like a Chemist , 2020, NeurIPS.

[13]  Pierre Baldi,et al.  No Electron Left Behind: A Rule-Based Expert System To Predict Chemical Reactions and Reaction Mechanisms , 2009, J. Chem. Inf. Model..

[14]  Junzhou Huang,et al.  SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction , 2019, BCB.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[17]  Alán Aspuru-Guzik,et al.  Neural Networks for the Prediction of Organic Chemistry Reactions , 2016, ACS central science.

[18]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[21]  Igor V. Tetko,et al.  A Transformer Model for Retrosynthesis , 2019, ICANN.

[22]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[24]  Bowen Liu,et al.  Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models , 2017, ACS central science.

[25]  Denis Fourches,et al.  Inductive transfer learning for molecular activity prediction: Next-Gen QSAR Models with MolPMoFiT , 2019, Journal of Cheminformatics.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Regina Barzilay,et al.  Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network , 2017, NIPS.

[28]  Gregory A Landrum,et al.  What's What: The (Nearly) Definitive Guide to Reaction Role Assignment , 2016, J. Chem. Inf. Model..

[29]  Hongyu Guo,et al.  A Graph to Graphs Framework for Retrosynthesis Prediction , 2020, ICML.

[30]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[31]  Awni Hannun,et al.  Self-Training for End-to-End Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Aman Singh,et al.  Machine-Learning Driven Drug Repurposing for COVID-19 , 2020, ArXiv.

[35]  Jiajun Shen,et al.  Revisiting Self-Training for Neural Sequence Generation , 2020, ICLR.

[36]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[37]  Regina Barzilay,et al.  Learning to Make Generalizable and Diverse Predictions for Retrosynthesis , 2019, ArXiv.

[38]  Regina Barzilay,et al.  Learning Graph Models for Template-Free Retrosynthesis , 2020, ArXiv.

[39]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[40]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[41]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[42]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.