Toxicity Prediction Using Pre-trained Autoencoder

Toxicology in the 21st Century (Tox21) is a collaborative initiative whose purpose is to investigate and develop efficient testing approaches to predict the impact chemical compounds have on Humans. In this paper we investigate how a pre-trained auto-encoder can be used to build classifiers capable of predicting the toxicity property of chemical compounds. Using a Deep Learning approach, we performed experiments to deter-mine if chemical compound fingerprints can be used to predict active and inactive compounds based on simplified molecular-input line-entry system (SMILES) in twelve selected assays. We conducted these experiments using data from ChEMBL and Tox21 to investigate how the latent layer produced by an auto-encoder can be used to train a classifier. All experimental results are compared against the winning teams of the Tox21 challenge, where positives and limitations of the proposed approaches are discussed.

[1]  Roberto Santana,et al.  Expanding variational autoencoders for learning and exploiting latent representations in search distributions , 2018, GECCO.

[2]  S. Hochreiter,et al.  DeepTox: Toxicity prediction using deep learning , 2017 .

[3]  Sepp Hochreiter,et al.  Toxicity Prediction using Deep Learning , 2015, ArXiv.

[4]  Navdeep Jaitly,et al.  Multi-task Neural Networks for QSAR Predictions , 2014, ArXiv.

[5]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[6]  Huixiao Hong,et al.  Predicting hepatotoxicity using ToxCast in vitro bioactivity and chemical structure. , 2015, Chemical research in toxicology.

[7]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[8]  David J. Fleet,et al.  Hamming Distance Metric Learning , 2012, NIPS.

[9]  A. Papoulis,et al.  Normal distributions , 1963 .

[10]  Lesson,et al.  The Normal Distribution , 2019, Essentials of Pattern Recognition.

[11]  J. Bailar,et al.  Toxicity testing in the 21st century—a vision and a strategy , 2012 .

[12]  Thomas G. Dietterich Overfitting and undercomputing in machine learning , 1995, CSUR.

[13]  Mathias Dunkel,et al.  Molecular similarity-based predictions of the Tox21 screening outcome , 2015, Front. Environ. Sci..

[14]  Robert Preissner,et al.  Computational methods for prediction of in vitro effects of new chemical structures , 2016, Journal of Cheminformatics.

[15]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[16]  J. Bailar,et al.  Toxicity Testing in the 21st Century: A Vision and a Strategy , 2010, Journal of toxicology and environmental health. Part B, Critical reviews.

[17]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[18]  J. Kazius,et al.  Derivation and validation of toxicophores for mutagenicity prediction. , 2005, Journal of medicinal chemistry.

[19]  Mathias Dunkel,et al.  ProTox: a web server for the in silico prediction of rodent oral toxicity , 2014, Nucleic Acids Res..

[20]  Oren Kurland,et al.  Kullback-Leibler Divergence Revisited , 2017, ICTIR.

[21]  Frederic P. Miller,et al.  Levenshtein Distance: Information theory, Computer science, String (computer science), String metric, Damerau?Levenshtein distance, Spell checker, Hamming distance , 2009 .

[22]  Károly Héberger,et al.  Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? , 2015, Journal of Cheminformatics.

[23]  Jean D. Gibbons,et al.  Kolmogorov-Smirnov Two-Sample Tests , 1981 .

[24]  David J. C. Mackay,et al.  Introduction to Monte Carlo Methods , 1998, Learning in Graphical Models.

[25]  Elnaz Jahani Heravi,et al.  Guide to Convolutional Neural Networks , 2017 .