Deep Learning-Based Structure-Activity Relationship Modeling for Multi-Category Toxicity Classification: A Case Study of 10K Tox21 Chemicals With High-Throughput Cell-Based Androgen Receptor Bioassay Data

Deep learning (DL) has attracted the attention of computational toxicologists as it offers a potentially greater power for in silico predictive toxicology than existing shallow learning algorithms. However, contradicting reports have been documented. To further explore the advantages of DL over shallow learning, we conducted this case study using two cell-based androgen receptor (AR) activity datasets with 10K chemicals generated from the Tox21 program. A nested double-loop cross-validation approach was adopted along with a stratified sampling strategy for partitioning chemicals of multiple AR activity classes (i.e., agonist, antagonist, inactive, and inconclusive) at the same distribution rates amongst the training, validation and test subsets. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p < 0.001, ANOVA) by 22–27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Further in-depth analyses of chemical scaffolding shed insights on structural alerts for AR agonists/antagonists and inactive/inconclusive compounds, which may aid in future drug discovery and improvement of toxicity prediction modeling.

[1]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[2]  Chaoyang Zhang,et al.  Target-specific toxicity knowledgebase (TsTKb): a novel toolkit for in silico predictive toxicology , 2018, Journal of environmental science and health. Part C, Environmental carcinogenesis & ecotoxicology reviews.

[3]  David Li,et al.  Deep Learning in Drug Discovery and Medicine; Scratching the Surface , 2018, Molecules.

[4]  Nicholas Cummins,et al.  Speech analysis for health: Current state-of-the-art and the increasing impact of deep learning. , 2018, Methods.

[5]  Ruifeng Liu,et al.  Assessing Deep and Shallow Learning Methods for Quantitative Prediction of Acute Chemical Toxicity , 2018, Toxicological sciences : an official journal of the Society of Toxicology.

[6]  Pierre Zweigenbaum,et al.  Expanding the Diversity of Texts and Applications: Findings from the Section on Clinical Natural Language Processing of the International Medical Informatics Association Yearbook , 2018, Yearbook of Medical Informatics.

[7]  Guanyu Wang,et al.  Machine Learning Based Toxicity Prediction: From Chemical Structural Description to Transcriptome Analysis , 2018, International journal of molecular sciences.

[8]  Artem Cherkasov,et al.  Toxic Colors: The Use of Deep Learning for Predicting Toxicity of Compounds Merely from Their Graphic Images , 2018, J. Chem. Inf. Model..

[9]  Hugo Ceulemans,et al.  Large-scale comparison of machine learning methods for drug target prediction on ChEMBL , 2018, Chemical science.

[10]  Evan Bolton,et al.  An update on PUG-REST: RESTful interface for programmatic access to PubChem , 2018, Nucleic Acids Res..

[11]  Kaoru Inoue,et al.  In Silico Prediction of Chemical-Induced Hepatocellular Hypertrophy Using Molecular Descriptors , 2018, Toxicological sciences : an official journal of the Society of Toxicology.

[12]  Guo-Wei Wei,et al.  Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks , 2017, J. Chem. Inf. Model..

[13]  George Papadatos,et al.  Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set , 2017, bioRxiv.

[14]  Alexios Koutsoukas,et al.  Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data , 2017, Journal of Cheminformatics.

[15]  Heung-Il Suk,et al.  Deep Learning in Medical Image Analysis. , 2017, Annual review of biomedical engineering.

[16]  Jianfeng Pei,et al.  Deep Learning Based Regression and Multiclass Models for Acute Oral Toxicity Prediction with Automatic Chemical Feature Extraction , 2017, J. Chem. Inf. Model..

[17]  S. Joshua Swamidass,et al.  Deep Learning to Predict the Formation of Quinone Species in Drug Metabolism. , 2017, Chemical research in toxicology.

[18]  Yuji Ikegaya,et al.  Machine learning-based prediction of adverse drug effects: An example of seizure-inducing compounds. , 2017, Journal of pharmacological sciences.

[19]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[20]  Ruili Huang,et al.  Editorial: Tox21 Challenge to Build Predictive Models of Nuclear Receptor and Stress Response Pathways As Mediated by Exposure to Environmental Toxicants and Drugs , 2017, Front. Environ. Sci..

[21]  S. Joshua Swamidass,et al.  Modeling Reactivity to Biological Macromolecules with a Deep Multitask Network , 2016, ACS central science.

[22]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[23]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..

[24]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[25]  Wei Liu,et al.  Deep Learning Driven Visual Path Prediction From a Single Image , 2016, IEEE Transactions on Image Processing.

[26]  Ruili Huang,et al.  Modelling the Tox21 10 K chemical profiles for in vivo toxicity prediction and mechanism characterization , 2016, Nature Communications.

[27]  Ruili Huang,et al.  Tox21Challenge to Build Predictive Models of Nuclear Receptor and Stress Response Pathways as Mediated by Exposure to Environmental Chemicals and Drugs , 2016, Front. Environ. Sci..

[28]  Radka Svobodová Vareková,et al.  High-quality and universal empirical atomic charges for chemoinformatics applications , 2015, Journal of Cheminformatics.

[29]  WS Stokes,et al.  Animals and the 3Rs in toxicology research and testing , 2015, Human & experimental toxicology.

[30]  Luhua Lai,et al.  Deep Learning for Drug-Induced Liver Injury , 2015, J. Chem. Inf. Model..

[31]  B. Chandra,et al.  Exploring autoencoders for unsupervised feature selection , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[32]  S. Joshua Swamidass,et al.  Modeling Epoxidation of Drug-like Molecules with a Deep Machine Learning Network , 2015, ACS central science.

[33]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[34]  Károly Héberger,et al.  Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? , 2015, Journal of Cheminformatics.

[35]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[36]  H. Xu,et al.  Androgen receptor: structure, role in prostate cancer and drug discovery , 2014, Acta Pharmacologica Sinica.

[37]  Timothy E H Allen,et al.  Defining molecular initiating events in the adverse outcome pathway framework for risk assessment. , 2014, Chemical research in toxicology.

[38]  Fernando De la Torre,et al.  Facing Imbalanced Data--Recommendations for the Use of Performance Metrics , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[39]  Ruili Huang,et al.  The Tox21 robotic platform for the assessment of environmental chemicals--from vision to reality. , 2013, Drug discovery today.

[40]  Bo-Han Su,et al.  Dependence of QSAR Models on the Selection of Trial Descriptor Sets: A Demonstration Using Nanotoxicity Endpoints of Decorated Nanotubes , 2013, J. Chem. Inf. Model..

[41]  Revised Guidance Document on Developing and Assessing Adverse Outcome Pathways , 2013 .

[42]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[43]  Т А Блошенко,et al.  Ведение основных положений Регламента ЕС №1907/2006 Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) , 2012 .

[44]  Pierre Baldi,et al.  Autoencoders, Unsupervised Learning, and Deep Architectures , 2011, ICML Unsupervised and Transfer Learning.

[45]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[46]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[47]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[48]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[49]  Gerald T. Ankley,et al.  Adverse outcome pathways: A conceptual framework to support ecotoxicology research and risk assessment: Adverse outcome pathways in ecological risk assessment , 2011 .

[50]  Thangapandian Sundarapandian,et al.  Docking-enabled pharmacophore model for histone deacetylase 8 inhibitors and its application in anti-cancer drug discovery , 2010 .

[51]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[52]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[53]  Igor Zilberberg,et al.  Paired orbitals for different spins equations , 2008, J. Comput. Chem..

[54]  Sugunadevi Sakkiah,et al.  Docking-enabled pharmacophore model for histone deacetylase 8 inhibitors and its application in anti-cancer drug discovery. , 2010, Journal of molecular graphics & modelling.

[55]  Steven K. Gibb Toxicity testing in the 21st century: a vision and a strategy. , 2008, Reproductive toxicology.

[56]  Julian Tirado-Rives,et al.  Contribution of conformer focusing to the uncertainty in predicting free energies for protein-ligand binding. , 2006, Journal of medicinal chemistry.

[57]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[58]  Fernand Labrie,et al.  Comparison of crystal structures of human androgen receptor ligand‐binding domain complexed with various agonists reveals molecular determinants responsible for binding affinity , 2006, Protein science : a publication of the Protein Society.

[59]  Dariusz Plewczynski,et al.  Assessing Different Classification Methods for Virtual Screening , 2006, J. Chem. Inf. Model..

[60]  Tomasz Arodz,et al.  Computational methods in developing quantitative structure-activity relationships (QSAR): a review. , 2006, Combinatorial chemistry & high throughput screening.

[61]  Akira Komiya,et al.  [Androgen receptor]. , 2005, Nihon rinsho. Japanese journal of clinical medicine.

[62]  Michel Waroquier,et al.  The Electronegativity Equalization Method I: Parametrization and Validation for Atomic Charge Calculations , 2002 .

[63]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.