Inductive transfer learning for molecular activity prediction: Next-Gen QSAR Models with MolPMoFiT

Deep neural networks can directly learn from chemical structures without extensive, user-driven selection of descriptors in order to predict molecular properties/activities with high reliability. But these approaches typically require very large training sets to truly learn the best endpoint-specific structural features and ensure reasonable prediction accuracy. Even though large datasets are becoming the new normal in drug discovery, especially when it comes to high-throughput screening or metabolomics datasets, one should also consider smaller datasets with very challenging endpoints to model and forecast. Thus, it would be highly relevant to better utilize the tremendous compendium of unlabeled compounds from publicly-available datasets for improving the model performances for the user’s particular series of compounds. In this study, we propose the Molecular Prediction Model Fine-Tuning (MolPMoFiT) approach, an effective transfer learning method that can be applied to any QSPR/QSAR problems. A large-scale molecular structure prediction model is pre-trained using one million unlabeled molecules from ChEMBL in a self-supervised learning manor, and can then be fine-tuned on various QSPR/QSAR tasks for smaller chemical datasets with a specific endpoints. Herein, the method is evaluated on three benchmark datasets (lipophilicity, HIV, and blood-brain barrier penetration). The results showed the method can achieve comparable or better prediction performances on all three datasets compared to state-of-the-art prediction techniques reported in the literature so far.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Regina Barzilay,et al.  Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction , 2017, J. Chem. Inf. Model..

[3]  Jure Leskovec,et al.  Pre-training Graph Neural Networks , 2019, ArXiv.

[4]  Denis Fourches,et al.  Hierarchical Quantitative Structure-Activity Relationship Modeling Approach for Integrating Binary, Multiclass, and Regression Models of Acute Oral Systemic Toxicity. , 2020 .

[5]  Thomas Blaschke,et al.  Exploring the GDB-13 chemical space using deep generative models , 2018, Journal of Cheminformatics.

[6]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[7]  Abhinav Vishnu,et al.  How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions? , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  L Xue,et al.  Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening. , 2000, Combinatorial chemistry & high throughput screening.

[9]  Denis Fourches,et al.  4D- quantitative structure–activity relationship modeling: making a comeback , 2019, Expert opinion on drug discovery.

[10]  M. Withnall,et al.  Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction , 2020, Journal of Cheminformatics.

[11]  Antony J. Williams,et al.  Computational Tools for ADMET Profiling , 2018 .

[12]  Svetha Venkatesh,et al.  Graph Memory Networks for Molecular Activity Prediction , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[13]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[14]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[15]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[16]  Emma J. Chory,et al.  A Deep Learning Approach to Antibiotic Discovery , 2020, Cell.

[17]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[18]  Alexandre Varnek,et al.  Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures , 2005, J. Comput. Aided Mol. Des..

[19]  Ola Engkvist,et al.  Randomized SMILES strings improve the quality of molecular generative models , 2019, Journal of Cheminformatics.

[20]  Edouard Grave,et al.  Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[21]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[22]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[23]  Igor V. Tetko,et al.  Inductive Transfer of Knowledge: Application of Multi-Task Learning and Feature Net Approaches to Model Tissue-Air Partition Coefficients , 2009, J. Chem. Inf. Model..

[24]  Sergey Sosnin,et al.  Comparative Study of Multitask Toxicity Modeling on a Broad Chemical Space , 2018, J. Chem. Inf. Model..

[25]  Chi Chen,et al.  Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals , 2018, Chemistry of Materials.

[26]  Robert P. Sheridan,et al.  Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction , 2013, J. Chem. Inf. Model..

[27]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[28]  Sabrina Jaeger,et al.  Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition , 2018, J. Chem. Inf. Model..

[29]  Ilya Sutskever,et al.  Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[30]  Regina Barzilay,et al.  Analyzing Learned Molecular Representations for Property Prediction , 2019, J. Chem. Inf. Model..

[31]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[32]  Valerie J Gillet,et al.  Effect of missing data on multitask prediction methods , 2018, Journal of Cheminformatics.

[33]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[34]  Christof Monz,et al.  Data Augmentation for Low-Resource Neural Machine Translation , 2017, ACL.

[35]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[36]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[37]  Artem Cherkasov,et al.  Toxic Colors: The Use of Deep Learning for Predicting Toxicity of Compounds Merely from Their Graphic Images , 2018, J. Chem. Inf. Model..

[38]  Zhen Wu,et al.  A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility , 2020, Journal of Cheminformatics.

[39]  Guo-Wei Wei,et al.  Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks , 2017, J. Chem. Inf. Model..

[40]  Shuang Wang,et al.  Molecule Property Prediction Based on Spatial Graph Embedding , 2019, J. Chem. Inf. Model..

[41]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[42]  Yang Li,et al.  PotentialNet for Molecular Property Prediction , 2018, ACS central science.

[43]  Abhinav Vishnu,et al.  Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models , 2017, ArXiv.

[44]  Cheng Lei,et al.  A Preliminary Study on Data Augmentation of Deep Learning for Image Classification , 2019, Internetware.

[45]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[46]  Matthijs Douze,et al.  FastText.zip: Compressing text classification models , 2016, ArXiv.

[47]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[48]  Adam C Mater,et al.  Deep Learning in Chemistry , 2019, J. Chem. Inf. Model..

[49]  Andy Liaw,et al.  Demystifying Multitask Deep Neural Networks for Quantitative Structure-Activity Relationships , 2017, J. Chem. Inf. Model..

[50]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[51]  J. Dearden,et al.  QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[52]  Abhinav Vishnu,et al.  Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction , 2017, KDD.

[53]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[54]  Igor V. Tetko,et al.  Synergy Effect between Convolutional Neural Networks and the Multiplicity of SMILES for Improvement of Molecular Prediction , 2018, ArXiv.

[55]  Yuedong Yang,et al.  Identifying Structure-Property Relationships through SMILES Syntax Analysis with Self-Attention Mechanism , 2018, J. Chem. Inf. Model..

[56]  Jeremy Howard,et al.  fastai: A Layered API for Deep Learning , 2020, Inf..

[57]  Ece Asilar,et al.  Image Based Liver Toxicity Prediction , 2020, J. Chem. Inf. Model..

[58]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[59]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[60]  Esben Jannik Bjerrum,et al.  SMILES Enumeration as Data Augmentation for Neural Network Modeling of Molecules , 2017, ArXiv.

[61]  Eugenio Culurciello,et al.  An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.

[62]  Abhinav Vishnu,et al.  SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties , 2017, ArXiv.

[63]  Leslie N. Smith,et al.  A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay , 2018, ArXiv.

[64]  Christopher Kanan,et al.  Data Augmentation for Visual Question Answering , 2017, INLG.

[65]  Isidro Cortes-Ciriano,et al.  Improved Chemical Structure-Activity Modeling Through Data Augmentation , 2015, J. Chem. Inf. Model..

[66]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[67]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[68]  Wei-keng Liao,et al.  CheMixNet: Mixed DNN Architectures for Predicting Chemical Properties using Multiple Molecular Representations , 2018, ArXiv.

[69]  Sosuke Kobayashi,et al.  Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations , 2018, NAACL.

[70]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[71]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[72]  Andreas Verras,et al.  Is Multitask Deep Learning Practical for Pharma? , 2017, J. Chem. Inf. Model..

[73]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[74]  Denis Fourches,et al.  Characterizing the Chemical Space of ERK2 Kinase Inhibitors Using Descriptors Computed from Molecular Dynamics Trajectories , 2017, J. Chem. Inf. Model..