A novel molecular representation with BiGRU neural networks for learning atom

Molecular representations play critical roles in researching drug design and properties, and effective methods are beneficial to assisting in the calculation of molecules and solving related problem in drug discovery. In previous years, most of the traditional molecular representations are based on hand-crafted features and rely heavily on biological experimentations, which are often costly and time consuming. However, recent researches achieve promising results using machine learning on various domains. In this article, we present a novel method named Smi2Vec-BiGRU that is designed for learning atoms and solving the single- and multitask binary classification problems in the field of drug discovery, which are the basic and also key problems in this field. Specifically, our approach transforms the molecule data in the SMILES format into a set of sample vectors and then feeds them into the bidirectional gated recurrent unit neural networks for training, which learns low-dimensional vector representations for molecular drug. We conduct extensive experiments on several widely used benchmarks including Tox21, SIDER and ClinTox. The experimental results show that our approach can achieve state-of-the-art performance on these benchmarking datasets, demonstrating the feasibility and competitiveness of our proposed approach.

[1]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[2]  Utpal Garain,et al.  Context Sensitive Lemmatization Using Two Successive Bidirectional Gated Recurrent Networks , 2017, ACL.

[3]  Ophir Frieder,et al.  Extracting Adverse Drug Reactions from Social Media , 2015, AAAI.

[4]  Philip S. Yu,et al.  Parallel Protein Community Detection in Large-scale PPI Networks Based on Multi-source Learning , 2018, IEEE/ACM transactions on computational biology and bioinformatics.

[5]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[6]  Gunnar Rätsch,et al.  Active Learning in the Drug Discovery Process , 2001, NIPS.

[7]  Andreas Ziehe,et al.  Learning Invariant Representations of Molecules for Atomization Energy Prediction , 2012, NIPS.

[8]  Kenli Li,et al.  A System for Learning Atoms Based on Long Short-Term Memory Recurrent Neural Networks , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[9]  Sergio Escalera,et al.  Beyond One-hot Encoding: lower dimensional target embedding , 2018, Image Vis. Comput..

[10]  Fei Wang,et al.  Drug Similarity Integration Through Attentive Multi-view Graph Auto-Encoders , 2018, IJCAI.

[11]  Vijay S. Pande,et al.  Low Data Drug Discovery with One-Shot Learning , 2016, ACS central science.

[12]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[14]  Kaitlyn M. Gayvert,et al.  A Data-Driven Approach to Predicting Successes and Failures of Clinical Trials. , 2016, Cell chemical biology.

[15]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[16]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[17]  Shou-Cheng Zhang,et al.  Learning atoms for materials discovery , 2018, Proceedings of the National Academy of Sciences.

[18]  Yixin Chen,et al.  An End-to-End Deep Learning Architecture for Graph Classification , 2018, AAAI.

[19]  Lei Yang,et al.  Classification of Cytochrome P450 Inhibitors and Noninhibitors Using Combined Classifiers , 2011, J. Chem. Inf. Model..

[20]  Kenli Li,et al.  An Ensemble CNN2ELM for Age Estimation , 2018, IEEE Transactions on Information Forensics and Security.

[21]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[22]  Ruoyu Li,et al.  Adaptive Graph Convolutional Neural Networks , 2018, AAAI.

[23]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[24]  Zoya Khalid,et al.  Prediction of HIV Drug Resistance by Combining Sequence and Structural Properties , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Vijay S. Pande,et al.  SWEETLEAD: an In Silico Database of Approved Drugs, Regulated Chemicals, and Herbal Isolates for Computer-Aided Drug Discovery , 2013, PloS one.

[27]  Chee Keong Kwoh,et al.  Drug-Target Interaction Prediction with Graph Regularized Matrix Factorization , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[29]  Wei Huang,et al.  A Novel Approach to Identify the miRNA-mRNA Causal Regulatory Modules in Cancer , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  Ping Zhang,et al.  Adverse Drug Reaction Prediction with Symbolic Latent Dirichlet Allocation , 2017, AAAI.

[31]  Jie Shen,et al.  admetSAR: A Comprehensive Source and Free Tool for Assessment of Chemical ADMET Properties , 2012, J. Chem. Inf. Model..

[32]  Kenli Li,et al.  A Parallel Multiclassification Algorithm for Big Data Using an Extreme Learning Machine , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Cheng Cheng,et al.  Rapid Assessment of Adverse Drug Reactions by Statistical Solution of Gene Association Network , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[34]  Jürgen Schmidhuber,et al.  An Application of Recurrent Neural Networks to Discriminative Keyword Spotting , 2007, ICANN.

[35]  Jing Zhang,et al.  Prediction of Novel Drugs for Hepatocellular Carcinoma Based on Multi-Source Random Walk , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[36]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[37]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[38]  Tatsuya Akutsu,et al.  LBSizeCleav: improved support vector machine (SVM)-based prediction of Dicer cleavage sites using loop/bulge length , 2016, BMC Bioinformatics.

[39]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[40]  G. Priya,et al.  EFFICIENT KNN CLASSIFICATION ALGORITHM FOR BIG DATA , 2017 .

[41]  Sriraam Natarajan,et al.  Identifying Adverse Drug Events by Relational Learning , 2012, AAAI.

[42]  Kenli Li,et al.  An Efficient Framework for Sentence Similarity Modeling , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[43]  Jiawei Luo,et al.  Inferring MicroRNA Targets Based on Restricted Boltzmann Machines , 2019, IEEE Journal of Biomedical and Health Informatics.

[44]  Regina Barzilay,et al.  Deriving Neural Architectures from Sequence and Graph Kernels , 2017, ICML.

[45]  Shahar Harel,et al.  Accelerating Prototype-Based Drug Discovery using Conditional Diversity Networks , 2018, KDD.

[46]  Xiangrong Liu,et al.  deepDR: a network-based deep learning approach to in silico drug repositioning , 2019, Bioinform..

[47]  Xuanjing Huang,et al.  Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents , 2015, EMNLP.

[48]  Ping Zhang,et al.  Multitask Dyadic Prediction and Its Application in Prediction of Adverse Drug-Drug Interaction , 2017, AAAI.

[49]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[50]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[51]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[52]  Satoru Miyano,et al.  A Novel Adaptive Penalized Logistic Regression for Uncovering Biomarker Associated with Anti-Cancer Drug Sensitivity , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[53]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[54]  Albert-László Barabási,et al.  Network-based prediction of drug combinations , 2019, Nature Communications.

[55]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[56]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[57]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[58]  Zhanxing Zhu,et al.  Spatio-temporal Graph Convolutional Neural Network: A Deep Learning Framework for Traffic Forecasting , 2017, IJCAI.

[59]  Chee-Keong Kwoh,et al.  Ensemble Prediction of Synergistic Drug Combinations Incorporating Biological, Chemical, Pharmacological, and Network Knowledge , 2019, IEEE Journal of Biomedical and Health Informatics.

[60]  Luis Pinheiro,et al.  A Bayesian Approach to in Silico Blood-Brain Barrier Penetration Modeling , 2012, J. Chem. Inf. Model..

[61]  Alex Alves Freitas,et al.  A new approach for interpreting Random Forest models and its application to the biology of ageing , 2018, Bioinform..

[62]  Satoshi Omori,et al.  Identification of the sequence determinants of protein N-terminal acetylation through a decision tree approach , 2017, BMC Bioinformatics.

[63]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[64]  Vijay S. Pande,et al.  Computational Modeling of β-Secretase 1 (BACE-1) Inhibitors Using Ligand Based Approaches , 2016, J. Chem. Inf. Model..

[65]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[66]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[67]  Antony J. Williams,et al.  ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology. , 2016, Chemical research in toxicology.