论文信息 - Toxicity Prediction Using Pre-trained Autoencoder

Toxicity Prediction Using Pre-trained Autoencoder

Toxicology in the 21st Century (Tox21) is a collaborative initiative whose purpose is to investigate and develop efficient testing approaches to predict the impact chemical compounds have on Humans. In this paper we investigate how a pre-trained auto-encoder can be used to build classifiers capable of predicting the toxicity property of chemical compounds. Using a Deep Learning approach, we performed experiments to deter-mine if chemical compound fingerprints can be used to predict active and inactive compounds based on simplified molecular-input line-entry system (SMILES) in twelve selected assays. We conducted these experiments using data from ChEMBL and Tox21 to investigate how the latent layer produced by an auto-encoder can be used to train a classifier. All experimental results are compared against the winning teams of the Tox21 challenge, where positives and limitations of the proposed approaches are discussed.

[1] Roberto Santana,et al. Expanding variational autoencoders for learning and exploiting latent representations in search distributions , 2018, GECCO.

[2] S. Hochreiter,et al. DeepTox: Toxicity prediction using deep learning , 2017 .

[3] Sepp Hochreiter,et al. Toxicity Prediction using Deep Learning , 2015, ArXiv.

[4] Navdeep Jaitly,et al. Multi-task Neural Networks for QSAR Predictions , 2014, ArXiv.

[5] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[6] Huixiao Hong,et al. Predicting hepatotoxicity using ToxCast in vitro bioactivity and chemical structure. , 2015, Chemical research in toxicology.

[7] George Papadatos,et al. The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[8] David J. Fleet,et al. Hamming Distance Metric Learning , 2012, NIPS.

[9] A. Papoulis,et al. Normal distributions , 1963 .

[10] Lesson,et al. The Normal Distribution , 2019, Essentials of Pattern Recognition.

[11] J. Bailar,et al. Toxicity testing in the 21st century—a vision and a strategy , 2012 .

[12] Thomas G. Dietterich. Overfitting and undercomputing in machine learning , 1995, CSUR.

[13] Mathias Dunkel,et al. Molecular similarity-based predictions of the Tox21 screening outcome , 2015, Front. Environ. Sci..

[14] Robert Preissner,et al. Computational methods for prediction of in vitro effects of new chemical structures , 2016, Journal of Cheminformatics.

[15] David Weininger,et al. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[16] J. Bailar,et al. Toxicity Testing in the 21st Century: A Vision and a Strategy , 2010, Journal of toxicology and environmental health. Part B, Critical reviews.

[17] J. Hanley,et al. The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[18] J. Kazius,et al. Derivation and validation of toxicophores for mutagenicity prediction. , 2005, Journal of medicinal chemistry.

[19] Mathias Dunkel,et al. ProTox: a web server for the in silico prediction of rodent oral toxicity , 2014, Nucleic Acids Res..

[20] Oren Kurland,et al. Kullback-Leibler Divergence Revisited , 2017, ICTIR.

[21] Frederic P. Miller,et al. Levenshtein Distance: Information theory, Computer science, String (computer science), String metric, Damerau?Levenshtein distance, Spell checker, Hamming distance , 2009 .

[22] Károly Héberger,et al. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? , 2015, Journal of Cheminformatics.

[23] Jean D. Gibbons,et al. Kolmogorov-Smirnov Two-Sample Tests , 1981 .

[24] David J. C. Mackay,et al. Introduction to Monte Carlo Methods , 1998, Learning in Graphical Models.

[25] Elnaz Jahani Heravi,et al. Guide to Convolutional Neural Networks , 2017 .