Efficient Toxicity Prediction via Simple Features Using Shallow Neural Networks and Decision Trees

Toxicity prediction of chemical compounds is a grand challenge. Lately, it achieved significant progress in accuracy but using a huge set of features, implementing a complex blackbox technique such as a deep neural network, and exploiting enormous computational resources. In this paper, we strongly argue for the models and methods that are simple in machine learning characteristics, efficient in computing resource usage, and powerful to achieve very high accuracy levels. To demonstrate this, we develop a single task-based chemical toxicity prediction framework using only 2D features that are less compute intensive. We effectively use a decision tree to obtain an optimum number of features from a collection of thousands of them. We use a shallow neural network and jointly optimize it with decision tree taking both network parameters and input features into account. Our model needs only a minute on a single CPU for its training while existing methods using deep neural networks need about 10 min on NVidia Tesla K40 GPU. However, we obtain similar or better performance on several toxicity benchmark tasks. We also develop a cumulative feature ranking method which enables us to identify features that can help chemists perform prescreening of toxic compounds effectively.

[1]  Sepp Hochreiter,et al.  Toxicity Prediction using Deep Learning , 2015, ArXiv.

[2]  Navdeep Jaitly,et al.  Multi-task Neural Networks for QSAR Predictions , 2014, ArXiv.

[3]  David M. Reif,et al.  Profiling of the Tox21 10K compound library for agonists and antagonists of the estrogen receptor alpha signaling pathway , 2014, Scientific Reports.

[4]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[5]  Jeffrey Strovel,et al.  Early Drug Discovery and Development Guidelines: For Academic Researchers, Collaborators, and Start-up Companies , 2016 .

[6]  P Smith,et al.  Concordance of the toxicity of pharmaceuticals in humans and in animals. , 2000, Regulatory toxicology and pharmacology : RTP.

[7]  M. Greenberg,et al.  Toxicity Testing in the 21st Century , 2009, Risk analysis : an official publication of the Society for Risk Analysis.

[8]  S. Parasuraman,et al.  Toxicological screening , 2011, Journal of pharmacology & pharmacotherapeutics.

[9]  Bjoern H. Menze,et al.  A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data , 2009, BMC Bioinformatics.

[10]  I. Rusyn,et al.  Computational Toxicology: Realizing the Promise of the Toxicity Testing in the 21st Century , 2010, Environmental health perspectives.

[11]  Markus A Lill,et al.  The challenge of predicting drug toxicity in silico. , 2006, Basic & clinical pharmacology & toxicology.

[12]  Tamara L. Sorell,et al.  Approaches to the Development of Human Health Toxicity Values for Active Pharmaceutical Ingredients in the Environment , 2015, The AAPS Journal.

[13]  Guo-Wei Wei,et al.  Quantitative Toxicity Prediction Using Topology Based Multitask Deep Neural Networks , 2017, J. Chem. Inf. Model..

[14]  Andreas Bender,et al.  Molecular Similarity Searching Using Atom Environments, Information-Based Feature Selection, and a Naïve Bayesian Classifier , 2004, J. Chem. Inf. Model..

[15]  Abhinav Vishnu,et al.  SMILES2vec: Predicting Chemical Properties from Text Representations , 2018 .

[16]  Tomaso A. Poggio,et al.  When and Why Are Deep Networks Better Than Shallow Ones? , 2017, AAAI.

[17]  Vijay S. Pande,et al.  Massively Multitask Networks for Drug Discovery , 2015, ArXiv.

[18]  A. Rowan,et al.  Ending the Use of Animals in Toxicity Testing and Risk Evaluation , 2015, Cambridge Quarterly of Healthcare Ethics.

[19]  Y Sakuratani,et al.  Hazard Evaluation Support System (HESS) for predicting repeated dose toxicity using toxicological categories , 2013, SAR and QSAR in environmental research.

[20]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[21]  Igor V. Tetko,et al.  Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis , 2008, J. Chem. Inf. Model..

[22]  Stephen J. Capuzzi,et al.  QSAR Modeling of Tox21 Challenge Stress Response and Nuclear Receptor Signaling Toxicity Assays , 2016, Front. Environ. Sci..

[23]  Cheng-Hao Deng,et al.  Fast k-Means Based on k-NN Graph , 2017, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[24]  Victor Kuzmin,et al.  Application of Random Forest Approach to QSAR Prediction of Aquatic Toxicity , 2009, J. Chem. Inf. Model..

[25]  J. Ramsdell,et al.  The red tide toxin, brevetoxin, induces embryo toxicity and developmental abnormalities. , 2001, Environmental health perspectives.

[26]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[27]  S. Joshua Swamidass,et al.  Modeling Reactivity to Biological Macromolecules with a Deep Multitask Network , 2016, ACS central science.

[28]  Sudhir A. Kulkarni,et al.  Three-Dimensional QSAR Using the k-Nearest Neighbor Method and Its Interpretation , 2006, J. Chem. Inf. Model..

[29]  Shinji Hamada,et al.  Molecular activity prediction using deep learning software library , 2016, 2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA).

[30]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[31]  S. Joshua Swamidass,et al.  Modeling Epoxidation of Drug-like Molecules with a Deep Machine Learning Network , 2015, ACS central science.

[32]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[33]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[34]  W Steiling,et al.  A critical review of the assessment of eye irritation potential using the draize rabbit eye test , 1998, Journal of applied toxicology : JAT.

[35]  G. Barta Identifying Biological Pathway Interrupting Toxins Using Multi-Tree Ensembles , 2016, Front. Environ. Sci..

[36]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[37]  Abhinav Vishnu,et al.  Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models , 2017, ArXiv.

[38]  Luhua Lai,et al.  Deep Learning for Drug-Induced Liver Injury , 2015, J. Chem. Inf. Model..

[39]  Weida Tong,et al.  Decision Forest: Combining the Predictions of Multiple Independent Decision Tree Models , 2003, J. Chem. Inf. Comput. Sci..

[40]  Xiaoyang Xia,et al.  Classification of kinase inhibitors using a Bayesian model. , 2004, Journal of medicinal chemistry.

[41]  R D Curren,et al.  In vitro alternatives for ocular irritation. , 1998, Environmental health perspectives.

[42]  Klaus-Robert Müller,et al.  Benchmark Data Set for in Silico Prediction of Ames Mutagenicity , 2009, J. Chem. Inf. Model..

[43]  David H Phillips,et al.  Mutagenicity testing for chemical risk assessment: update of the WHO/IPCS Harmonized Scheme. , 2009, Mutagenesis.

[44]  Pierre Baldi,et al.  Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules , 2013, J. Chem. Inf. Model..

[45]  Yoshua Bengio,et al.  Big Neural Networks Waste Capacity , 2013, ICLR.

[46]  S. Joshua Swamidass,et al.  Site of reactivity models predict molecular reactivity of diverse chemicals with glutathione. , 2015, Chemical research in toxicology.

[47]  Massoud Mahmoudian,et al.  Evaluation of Mutagenicity of Mebudipine, a New Calcium Channel Blocker , 2010, Iranian journal of pharmaceutical research : IJPR.

[48]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[49]  Gregory W. Kauffman,et al.  QSAR and k-Nearest Neighbor Classification Analysis of Selective Cyclooxygenase-2 Inhibitors Using Topologically-Based Numerical Descriptors , 2001, J. Chem. Inf. Comput. Sci..

[50]  János Podani,et al.  Introduction to the exploration of multivariate biological data , 2000 .

[51]  Nir Ailon,et al.  Streaming k-means approximation , 2009, NIPS.

[52]  Ruili Huang,et al.  Compound Cytotoxicity Profiling Using Quantitative High-Throughput Screening , 2007, Environmental health perspectives.

[53]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[54]  Brian Goldman,et al.  Modeling Industrial ADMET Data with Multitask Networks , 2016, 1606.08793.

[55]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[56]  Ian A. Nicholls,et al.  Acute Toxicity-Supported Chronic Toxicity Prediction: A k-Nearest Neighbor Coupled Read-Across Strategy , 2015, International journal of molecular sciences.

[57]  Gilles Louppe,et al.  Understanding variable importances in forests of randomized trees , 2013, NIPS.

[58]  Andreas Eckert,et al.  ProTox-II: a webserver for the prediction of toxicity of chemicals , 2018, Nucleic Acids Res..

[59]  M. Rupp,et al.  Machine learning of molecular electronic properties in chemical compound space , 2013, 1305.7074.

[60]  S. Hochreiter,et al.  DeepTox: Toxicity prediction using deep learning , 2017 .

[61]  Gerard V. Trunk,et al.  A Problem of Dimensionality: A Simple Example , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Wolfgang Völkel,et al.  Toxicokinetics of the mycotoxin ochratoxin A in F 344 rats after oral administration. , 2003, Toxicology and applied pharmacology.

[63]  Wolfgang Dekant,et al.  Toxicity assessment strategies, data requirements, and risk assessment approaches to derive health based guidance values for non-relevant metabolites of plant protection products. , 2010, Regulatory toxicology and pharmacology : RTP.

[64]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[65]  Igor V. Tetko,et al.  Consensus Modeling for HTS Assays Using In silico Descriptors Calculates the Best Balanced Accuracy in Tox21 Challenge , 2016, Front. Environ. Sci..

[66]  Gustavo Henrique Goulart Trossini,et al.  Use of machine learning approaches for novel drug discovery , 2016, Expert opinion on drug discovery.

[67]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[68]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[69]  Jie Li,et al.  admetSAR 2.0: web‐service for prediction and optimization of chemical ADMET properties , 2018, Bioinform..

[70]  Hui Gong,et al.  CEBS: a comprehensive annotated database of toxicological data , 2016, Nucleic Acids Res..

[71]  Jie Shen,et al.  admetSAR: A Comprehensive Source and Free Tool for Assessment of Chemical ADMET Properties , 2012, J. Chem. Inf. Model..

[72]  C. Q. Moreira,et al.  Maternal and developmental toxicity of ayahuasca in Wistar rats. , 2010, Birth defects research. Part B, Developmental and reproductive toxicology.

[73]  单靖雅 派对时光 Party on , 2014 .

[74]  Ruili Huang,et al.  Tox21Challenge to Build Predictive Models of Nuclear Receptor and Stress Response Pathways as Mediated by Exposure to Environmental Chemicals and Drugs , 2016, Front. Environ. Sci..

[75]  David A Winkler,et al.  Performance of Deep and Shallow Neural Networks, the Universal Approximation Theorem, Activity Cliffs, and QSAR , 2017, Molecular informatics.

[76]  Andrea Vattani k-means Requires Exponentially Many Iterations Even in the Plane , 2011, Discret. Comput. Geom..