Deep Learning Based Regression and Multiclass Models for Acute Oral Toxicity Prediction with Automatic Chemical Feature Extraction

Median lethal death, LD50, is a general indicator of compound acute oral toxicity (AOT). Various in silico methods were developed for AOT prediction to reduce costs and time. In this study, we developed an improved molecular graph encoding convolutional neural networks (MGE-CNN) architecture to construct three types of high-quality AOT models: regression model (deepAOT-R), multiclassification model (deepAOT-C), and multitask model (deepAOT-CR). These predictive models highly outperformed previously reported models. For the two external data sets containing 1673 (test set I) and 375 (test set II) compounds, the R2 and mean absolute errors (MAEs) of deepAOT-R on the test set I were 0.864 and 0.195, and the prediction accuracies of deepAOT-C were 95.5% and 96.3% on test sets I and II, respectively. The two external prediction accuracies of deepAOT-CR are 95.0% and 94.1%, while the R2 and MAE are 0.861 and 0.204 for test set I, respectively. We then performed forward and backward exploration of deepAOT models for deep fingerprints, which could support shallow machine learning methods more efficiently than traditional fingerprints or descriptors. We further performed automatic feature learning, a key essence of deep learning, to map the corresponding activation values into fragment space and derive AOT-related chemical substructures by reverse mining of the features. Our deep learning architecture for AOT is generally applicable in predicting and exploring other toxicity or property end points of chemical compounds. The two deepAOT models are freely available at http://repharma.pku.edu.cn/DLAOT/DLAOThome.php or http://www.pkumdl.cn/DLAOT/DLAOThome.php .

[1]  J. Devillers,et al.  Prediction of acute mammalian toxicity from QSARs and interspecies correlations , 2009, SAR and QSAR in environmental research.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Jie Li,et al.  Evaluation of Different Methods for Identification of Structural Alerts Using Chemical Ames Mutagenicity Data Set as a Benchmark. , 2017, Chemical research in toxicology.

[4]  P. Jurs,et al.  Prediction of acute mammalian toxicity of organophosphorus pesticide compounds from molecular structure. , 1999, SAR and QSAR in environmental research.

[5]  Maykel Pérez González,et al.  Quantitative structure carcinogenicity relationship for detecting structural alerts in nitroso-compounds. , 2007, Toxicology and applied pharmacology.

[6]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[7]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[10]  Paulo Cortez,et al.  Using sensitivity analysis and visualization techniques to open black box data mining models , 2013, Inf. Sci..

[11]  G. Klopman MULTICASE 1. A Hierarchical Computer Automated Structure Evaluation Program , 1992 .

[12]  S. Hanini,et al.  A Quantitative Structure Activity Relationship for acute oral toxicity of pesticides on rats: Validation, domain of application and prediction. , 2016, Journal of hazardous materials.

[13]  Gisbert Schneider,et al.  Deep Learning in Drug Discovery , 2016, Molecular informatics.

[14]  Igor V. Tetko,et al.  ToxAlerts: A Web Server of Structural Alerts for Toxic Chemicals and Compounds with Potential Adverse Reactions , 2012, J. Chem. Inf. Model..

[15]  Gregory Landrum,et al.  RDKit: Open-source cheminformatics. Release 2014.03.1 , 2014 .

[16]  D. Zakarya,et al.  Analysis of structure-toxicity relationships for a series of amide herbicides using statistical methods and neural network. , 1996, SAR and QSAR in environmental research.

[17]  Jie Shen,et al.  admetSAR: A Comprehensive Source and Free Tool for Assessment of Chemical ADMET Properties , 2012, J. Chem. Inf. Model..

[18]  Alán Aspuru-Guzik,et al.  The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid , 2011 .

[19]  G. Klopman Artificial intelligence approach to structure-activity studies. Computer automated structure evaluation of biological activity of organic molecules , 1985 .

[20]  Pierre Baldi,et al.  Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules , 2013, J. Chem. Inf. Model..

[21]  Jerzy Leszczynski,et al.  QSAR Modeling of Acute Toxicity for Nitrobenzene Derivatives Towards Rats: Comparative Analysis by MLRA and Optimal Descriptors , 2007 .

[22]  Robert H Gallavan,et al.  Chlorosilane Acute Inhalation Toxicity and Development of an LC50 Prediction Model , 2006, Inhalation toxicology.

[23]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Evan Bolton,et al.  PubChem's BioAssay Database , 2011, Nucleic Acids Res..

[25]  Ruifeng Liu,et al.  Data-driven identification of structural alerts for mitigating the risk of drug-induced human liver injuries , 2015, Journal of Cheminformatics.

[26]  Sebastian G. Rohrer,et al.  Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data , 2009, J. Chem. Inf. Model..

[27]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[28]  K Enslein A toxicity estimation model. , 1978, Journal of environmental pathology and toxicology.

[29]  Steven R. Young,et al.  Optimizing deep learning hyper-parameters through an evolutionary algorithm , 2015, MLHPC@SC.

[30]  G. Lushington,et al.  Mechanistic insight into acetylcholinesterase inhibition and acute toxicity of organophosphorus compounds: a molecular modeling study. , 2006, Chemical research in toxicology.

[31]  Salah Hanini,et al.  Artificial neural network-based equation to predict the toxicity of herbicides on rats , 2016 .

[32]  Peter S. Kutchukian,et al.  Rethinking molecular similarity: comparing compounds on the basis of biological activity. , 2012, ACS chemical biology.

[33]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[34]  Hua Yuan,et al.  Local and Global Quantitative Structure-Activity Relationship Modeling and Prediction for the Baseline Toxicity , 2007, J. Chem. Inf. Model..

[35]  R. Todeschini,et al.  Molecular Descriptors for Chemoinformatics: Volume I: Alphabetical Listing / Volume II: Appendices, References , 2009 .

[36]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[37]  Aurelio José Figueredo,et al.  Assortative Pairing and Life History Strategy A Cross-Cultural Study , 2009 .

[38]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..

[39]  Alexander Tropsha,et al.  Quantitative structure-activity relationship modeling of rat acute toxicity by oral exposure. , 2009, Chemical research in toxicology.

[40]  S. Enoch,et al.  Identification of mechanisms of toxic action for skin sensitisation using a SMARTS pattern based approach , 2008, SAR and QSAR in environmental research.

[41]  Patrick Aloy,et al.  A chemo-centric view of human health and disease , 2014, Nature Communications.

[42]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[43]  Xiao Li,et al.  In Silico Prediction of Chemical Acute Oral Toxicity Using Multi-Classification Methods , 2014, J. Chem. Inf. Model..

[44]  James R. Brown,et al.  Thousands of chemical starting points for antimalarial lead identification , 2010, Nature.

[45]  Aurelio José Figueredo,et al.  Assortative Pairing and Life History Strategy , 2009 .

[46]  Alan R. Katritzky,et al.  COMPREHENSIVE DESCRIPTORS FOR STRUCTURAL AND STATISTICAL ANALYSIS. 1 : CORRELATIONS BETWEEN STRUCTURE AND PHYSICAL PROPERTIES OF SUBSTITUTED PYRIDINES , 1996 .

[47]  Kurt Enslein,et al.  A Predictive Model for Estimating Rat Oral Ld50 Values , 1989 .

[48]  Tingjun Hou,et al.  ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling , 2016, Journal of Cheminformatics.

[49]  Romualdo Benigni,et al.  Structure alerts for carcinogenicity, and the Salmonella assay system: a novel insight through the chemical relational databases technology. , 2008, Mutation research.

[50]  Luhua Lai,et al.  Deep Learning for Drug-Induced Liver Injury , 2015, J. Chem. Inf. Model..

[51]  J. Hermens,et al.  Electrophiles and acute toxicity to fish. , 1990, Environmental health perspectives.

[52]  S. Hochreiter,et al.  DeepTox: Toxicity prediction using deep learning , 2017 .

[53]  Andrew C. Good,et al.  An Empirical Process for the Design of High-Throughput Screening Deck Filters. , 2006 .

[54]  M T D Cronin,et al.  A review of the electrophilic reaction chemistry involved in covalent protein binding relevant to toxicity , 2011, Critical reviews in toxicology.

[55]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[56]  Xiaomin Luo,et al.  Estimation of acute oral toxicity in rat using local lazy learning , 2014, Journal of Cheminformatics.

[57]  Pierre Baldi,et al.  The inner and outer approaches to the design of recursive neural architectures , 2017, Data Mining and Knowledge Discovery.

[58]  Regina Barzilay,et al.  Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction , 2017, J. Chem. Inf. Model..

[59]  Roberto Todeschini,et al.  Molecular descriptors for chemoinformatics , 2009 .

[60]  Vladimir Poroikov,et al.  Robustness of Biological Activity Spectra Predicting by Computer Program PASS for Noncongeneric Sets of Chemical Compounds , 2000, J. Chem. Inf. Comput. Sci..

[61]  John S. Delaney,et al.  ESOL: Estimating Aqueous Solubility Directly from Molecular Structure , 2004, J. Chem. Inf. Model..

[62]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[63]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .