QSAR Modeling of Tox21 Challenge Stress Response and Nuclear Receptor Signaling Toxicity Assays

The ability to determine which environmental chemicals pose the greatest potential threats to human health remains one of the major concerns in regulatory toxicology. Computation methods that can accurately predict the chemicals’ toxic potential in silico are increasingly sought-after to replace in vitro high-throughput screening (HTS) as well as controversial and costly in vivo animal studies. To this end, we have built Quantitative Structure-Activity Relationship (QSAR) models of twelve (12) stress response and nuclear receptor signaling pathways toxicity assays as part of the 2014 Tox21 Challenge. Our models were built using the Random Forest, Deep Neural Networks and various combinations of descriptors and balancing protocols. All of our models were statistically significant for each of the 12 assays with the balanced accuracy in the range between 0.58 and 0.82. Our results also show that models built with Deep Neural Networks had high accuracy than those developed with simple machine learning algorithms and that dataset balancing led to a significant accuracy decrease.

[1]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[2]  S. J. Lee,et al.  The quantification and characterization of endocrine disruptor bisphenol-A leaching from epoxy resin. , 2002, Water science and technology : a journal of the International Association on Water Pollution Research.

[3]  Sungkyu Lee,et al.  Effects of endocrine disrupting chemicals on distinct expression patterns of estrogen receptor, cytochrome P450 aromatase and p53 genes in oryzias latipes liver , 2003, Journal of biochemical and molecular toxicology.

[4]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[5]  L. Yu,et al.  Up-regulation of LRP16 mRNA by 17beta-estradiol through activation of estrogen receptor alpha (ERalpha), but not ERbeta, and promotion of human breast cancer MCF-7 cell proliferation: a preliminary report. , 2003, Endocrine-related cancer.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  I. Kola,et al.  Can the pharmaceutical industry reduce attrition rates? , 2004, Nature Reviews Drug Discovery.

[8]  J. J. Chen,et al.  Classification ensembles for unbalanced class sizes in predictive toxicology , 2005, SAR and QSAR in environmental research.

[9]  P. Bernardi,et al.  High concordance of drug-induced human hepatotoxicity with in vitro cytotoxicity measured in a novel cell-based model using high content screening , 2006, Archives of Toxicology.

[10]  D. Dix,et al.  The ToxCast program for prioritizing toxicity testing of environmental chemicals. , 2007, Toxicological sciences : an official journal of the Society of Toxicology.

[11]  Victor Kuzmin,et al.  Hierarchical QSAR technology based on the Simplex representation of molecular structure , 2008, J. Comput. Aided Mol. Des..

[12]  L. Giudice,et al.  Endocrine-disrupting chemicals: an Endocrine Society scientific statement. , 2009, Endocrine reviews.

[13]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[14]  Eugene N Muratov,et al.  Per aspera ad astra: application of Simplex QSAR approach in antiviral research. , 2010, Future medicinal chemistry.

[15]  Alexander Tropsha,et al.  Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research , 2010, J. Chem. Inf. Model..

[16]  Alexander Tropsha,et al.  Chembench: a cheminformatics workbench , 2010, Bioinform..

[17]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[18]  Paul Anastas,et al.  Ensuring the safety of chemicals , 2010, Journal of Exposure Science and Environmental Epidemiology.

[19]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[20]  C. Casals-Casas,et al.  Endocrine disruptors: from endocrine to metabolic disruption. , 2011, Annual review of physiology.

[21]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[22]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[23]  D. Chandra Mitochondria as Targets for Phytochemicals in Cancer Prevention and Therapy , 2013, Springer New York.

[24]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[25]  Huixiao Hong,et al.  Predicting hepatotoxicity using ToxCast in vitro bioactivity and chemical structure. , 2015, Chemical research in toxicology.

[26]  Luhua Lai,et al.  Deep Learning for Drug-Induced Liver Injury , 2015, J. Chem. Inf. Model..

[27]  Yi Ding,et al.  Adaptive Subgradient Methods for Online AUC Maximization , 2016, ArXiv.

[28]  S. Hochreiter,et al.  DeepTox: Toxicity prediction using deep learning , 2017 .