Comparative Study of Multitask Toxicity Modeling on a Broad Chemical Space

Acute toxicity is one of the most challenging properties to predict purely with computational methods due to its direct relationship to biological interactions. Moreover, toxicity can be represented by different end points: it can be measured for different species using different types of administration, etc., and it is questionable if the knowledge transfer between end points is possible. We performed a comparative study of prediction multitask toxicity for a broad chemical space using different descriptors and modeling algorithms and applied multitask learning for a large toxicity data set extracted from the Registry of Toxic Effects of Chemical Substances (RTECS). We demonstrated that multitask modeling provides significant improvement over single-output models and other machine learning methods. Our research reveals that multitask learning can be very useful to improve the quality of acute toxicity modeling and raises a discussion about the usage of multitask approaches for regulation purposes. Our MultiTox models are freely available in OCHEM platform ( ochem.eu/multitox ) under CC-BY-NC license.

[1]  Victor Kuzmin,et al.  Hierarchical QSAR technology based on the Simplex representation of molecular structure , 2008, J. Comput. Aided Mol. Des..

[2]  Igor V. Tetko,et al.  Estimation of Aqueous Solubility of Chemical Compounds Using E-State Indices , 2001, J. Chem. Inf. Comput. Sci..

[3]  Vladimir Potemkin,et al.  Principles for 3D/4D QSAR classification of drugs. , 2008, Drug discovery today.

[4]  Yuanyuan Wang,et al.  Predictive Toxicology: Benchmarking Molecular Descriptors and Statistical Methods , 2003, J. Chem. Inf. Comput. Sci..

[5]  Andy Liaw,et al.  Demystifying Multitask Deep Neural Networks for Quantitative Structure-Activity Relationships , 2017, J. Chem. Inf. Model..

[6]  Jian Zhao,et al.  CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods , 2017, Scientific Reports.

[7]  Yuan Zhang,et al.  Insights into the Molecular Basis of the Acute Contact Toxicity of Diverse Organic Chemicals in the Honey Bee , 2017, J. Chem. Inf. Model..

[8]  C L Alden,et al.  A Critical Review of the Effectiveness of Rodent Pharmaceutical Carcinogenesis Testing in Predicting for Human Risk , 2011, Veterinary pathology.

[9]  John B. O. Mitchell Machine learning methods in chemoinformatics , 2014, Wiley interdisciplinary reviews. Computational molecular science.

[10]  Ann M Richard,et al.  Distributed structure-searchable toxicity (DSSTox) public database network: a proposal. , 2002, Mutation research.

[11]  Arthur N. Mayeno,et al.  Computational Toxicology , 2013, Methods in Molecular Biology.

[12]  Adam Yasgar,et al.  Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Vesna Rastija,et al.  PyDescriptor : A new PyMOL plugin for calculating thousands of easily understandable molecular descriptors , 2017 .

[14]  Andreas Bender,et al.  eMolTox: prediction of molecular toxicity with confidence , 2018, Bioinform..

[15]  I. Tetko,et al.  A Survey of Multi‐task Learning Methods in Chemoinformatics , 2018, Molecular informatics.

[16]  Artem Cherkasov Inductive QSAR Descriptors. Distinguishing Compounds with Antibacterial Activity by Artificial Neural Networks , 2005 .

[17]  V V Poroikov,et al.  PASS Targets: Ligand-based multi-target computational system based on a public data and naïve Bayes approach$ , 2015, SAR and QSAR in environmental research.

[18]  Igor V. Tetko,et al.  Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information , 2011, J. Comput. Aided Mol. Des..

[19]  Igor V. Tetko,et al.  Neural Network Studies, 4. Introduction to Associative Neural Networks , 2002, J. Chem. Inf. Comput. Sci..

[20]  Y. Wang,et al.  Using support vector regression coupled with the genetic algorithm for predicting acute toxicity to the fathead minnow , 2010, SAR and QSAR in environmental research.

[21]  Gregory W. Kauffman,et al.  QSAR and k-Nearest Neighbor Classification Analysis of Selective Cyclooxygenase-2 Inhibitors Using Topologically-Based Numerical Descriptors , 2001, J. Chem. Inf. Comput. Sci..

[22]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[23]  Igor V. Tetko,et al.  ToxCast EPA in Vitro to in Vivo Challenge: Insight into the Rank-I Model , 2016, Chemical research in toxicology.

[24]  Igor V. Tetko,et al.  BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry , 2016, Molecular informatics.

[25]  Andy Liaw,et al.  Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships , 2016, J. Chem. Inf. Model..

[26]  David Vidal,et al.  Nomen Est Omen: Quantitative Prediction of Molecular Properties Directly from IUPAC Names , 2007 .

[27]  Catharyn T. Liverman,et al.  Internet Access to the National Library of Medicine's Toxicology and Environmental Health Databases , 1998 .

[28]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[29]  Sepp Hochreiter,et al.  Toxicity Prediction using Deep Learning , 2015, ArXiv.

[30]  Navdeep Jaitly,et al.  Multi-task Neural Networks for QSAR Predictions , 2014, ArXiv.

[31]  Igor V. Tetko,et al.  Application of Associative Neural Networks for Prediction of Lipophilicity in ALOGPS 2.1 Program , 2002, J. Chem. Inf. Comput. Sci..

[32]  Qingsong Xu,et al.  Computer‐aided prediction of toxicity with substructure pattern and random forest , 2012 .

[33]  Igor V. Tetko,et al.  Inductive Transfer of Knowledge: Application of Multi-Task Learning and Feature Net Approaches to Model Tissue-Air Partition Coefficients , 2009, J. Chem. Inf. Model..

[34]  Akash Khandelwal,et al.  Prediction of hERG Potassium Channel Blockade Using kNN-QSAR and Local Lazy Regression Methods , 2008 .

[35]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[36]  Mathias Dunkel,et al.  ProTox: a web server for the in silico prediction of rodent oral toxicity , 2014, Nucleic Acids Res..

[37]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[38]  T. Xia,et al.  Development of structure-activity relationship for metal oxide nanoparticles. , 2013, Nanoscale.

[39]  S. Auerbach,et al.  Predicting the hepatocarcinogenic potential of alkenylbenzene flavoring agents using toxicogenomics and machine learning. , 2010, Toxicology and applied pharmacology.

[40]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[41]  Harvey J. Clewell,et al.  Chemical Risk Analysis: A Practical Handbook , 2002 .

[42]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[43]  Thomas Sander,et al.  Toxicity-Indicating Structural Patterns , 2006, J. Chem. Inf. Model..

[44]  Matthew A Cooper,et al.  Cell- and biomarker-based assays for predicting nephrotoxicity , 2014, Expert opinion on drug metabolism & toxicology.

[45]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[46]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[47]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[48]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[49]  M. Asadollahi-Baboli,et al.  Exploring QSTR analysis of the toxicity of phenols and thiophenols using machine learning methods. , 2012, Environmental toxicology and pharmacology.

[50]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[51]  Chi Heem Wong,et al.  Estimation of clinical trial success rates and related parameters , 2018, Biostatistics.

[52]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[53]  Anton Simeonov,et al.  The US Federal Tox21 Program: A strategic and operational plan for continued leadership. , 2018, ALTEX.

[54]  Michael F. W. Festing,et al.  The Extended Statistical Analysis of Toxicity Tests Using Standardised Effect Sizes (SESs): A Comparison of Nine Published Papers , 2014, PloS one.

[55]  Gert Thijs,et al.  Application of spectrophores™ to map vendor chemical space using self-organising maps , 2011, J. Cheminformatics.

[56]  K. Wanner,et al.  Methods and Principles in Medicinal Chemistry , 2007 .

[57]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[58]  Igor V. Tetko,et al.  ToxAlerts: A Web Server of Structural Alerts for Toxic Chemicals and Compounds with Potential Adverse Reactions , 2012, J. Chem. Inf. Model..