ToxCast EPA in Vitro to in Vivo Challenge: Insight into the Rank-I Model

The ToxCast EPA challenge was managed by TopCoder in Spring 2014. The goal of the challenge was to develop a model to predict the lowest effect level (LEL) concentration based on in vitro measurements and calculated in silico descriptors. This article summarizes the computational steps used to develop the Rank-I model, which calculated the lowest prediction error for the secret test data set of the challenge. The model was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM), and it is freely available at http://ochem.eu/article/68104. Surprisingly, this model does not use any in vitro measurements. The logic of the decision steps used to develop the model and the reason to skip inclusion of in vitro measurements is described. We also show that inclusion of in vitro assays would not improve the accuracy of the model.

[1]  I. Tetko,et al.  Spatiotemporal activity patterns of rat cortical neurons predict responses in a conditioned task. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Igor V Tetko,et al.  Prediction of logP for Pt(II) and Pt(IV) complexes: Comparison of statistical and quantum-chemistry based approaches. , 2016, Journal of inorganic biochemistry.

[3]  Igor V. Tetko,et al.  Efficient Partition of Learning Data Sets for Neural Network Training , 1997, Neural Networks.

[4]  Dragos Horvath,et al.  Design of a General‐Purpose European Compound Screening Library for EU‐OPENSCREEN , 2014, ChemMedChem.

[5]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[6]  Igor V Tetko,et al.  Identifying potential endocrine disruptors among industrial chemicals and their metabolites--development and evaluation of in silico tools. , 2015, Chemosphere.

[7]  Igor V Tetko,et al.  A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition , 2011, J. Chem. Inf. Model..

[8]  Igor V Tetko,et al.  Calculation of lipophilicity for Pt(II) complexes: experimental comparison of several methods , 2008, Chemistry Central Journal.

[9]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[10]  Vladimir Potemkin,et al.  A new paradigm for pattern recognition of drugs , 2008, J. Comput. Aided Mol. Des..

[11]  I. Tetko,et al.  Extended Functional Groups (EFG): An Efficient Set for Chemical Characterization and Structure-Activity Relationship Studies of Chemical Compounds , 2015, Molecules.

[12]  L. Hall,et al.  Molecular Structure Description: The Electrotopological State , 1999 .

[13]  Igor V. Tetko,et al.  Benchmarking of Linear and Nonlinear Approaches for Quantitative Structure-Property Relationship Studies of Metal Complexation with Ionophores , 2006, J. Chem. Inf. Model..

[14]  Igor V Tetko,et al.  Using Online Tool (iPrior) for Modeling ToxCast™ Assays Towards Prioritization of Animal Toxicity Testing. , 2015, Combinatorial chemistry & high throughput screening.

[15]  S. V. Antonenko,et al.  HIV-1 reverse transcriptase inhibitor design using artificial neural networks. , 1994, Journal of medicinal chemistry.

[16]  Igor V. Tetko,et al.  Neural Network Studies, 2. Variable Selection , 1996, J. Chem. Inf. Comput. Sci..

[17]  David Dix,et al.  Computational Toxicology as Implemented by the U.S. EPA: Providing High Throughput Decision Support Tools for Screening and Assessing Chemical Exposure, Hazard and Risk , 2010, Journal of toxicology and environmental health. Part B, Critical reviews.

[18]  Igor V. Tetko,et al.  Neural Network Studies, 4. Introduction to Associative Neural Networks , 2002, J. Chem. Inf. Comput. Sci..

[19]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[20]  Igor V. Tetko,et al.  Development of Dimethyl Sulfoxide Solubility Models Using 163 000 Molecules: Using a Domain Applicability Metric to Select More Reliable Predictions , 2013, J. Chem. Inf. Model..

[21]  Igor V. Tetko,et al.  Neural Network Studies. 3. Variable Selection in the Cascade-Correlation Learning Architecture , 1998, J. Chem. Inf. Comput. Sci..

[22]  Igor V. Tetko,et al.  How Accurately Can We Predict the Melting Points of Drug-like Compounds? , 2014, J. Chem. Inf. Model..

[23]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[24]  Igor V. Tetko,et al.  Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis , 2008, J. Chem. Inf. Model..

[25]  Igor V. Tetko,et al.  The perspectives of computational chemistry modeling , 2011, Journal of Computer-Aided Molecular Design.

[26]  Igor V. Tetko,et al.  Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information , 2011, J. Comput. Aided Mol. Des..

[27]  Gregg D. Wilensky,et al.  Neural Network Studies , 1993 .

[28]  Raimund Mannhold,et al.  Large‐Scale Evaluation of log P Predictors: Local Corrections May Compensate Insufficient Accuracy and Need of Experimentally Testing Every Other Compound , 2009, Chemistry & biodiversity.

[29]  Igor V. Tetko,et al.  Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection , 2008, J. Chem. Inf. Model..

[30]  I. Tetko,et al.  QSAR models and scaffold-based analysis of non-nucleoside HIV RT inhibitors , 2015 .

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[32]  David M. Reif,et al.  In Vitro Screening of Environmental Chemicals for Targeted Testing Prioritization: The ToxCast Project , 2009, Environmental health perspectives.

[33]  Igor V. Tetko,et al.  A pattern grouping algorithm for analysis of spatiotemporal patterns in neuronal spike trains. 2. Application to simultaneous single unit recordings , 2001, Journal of Neuroscience Methods.

[34]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[35]  Johann Gasteiger,et al.  Of molecules and humans. , 2006, Journal of medicinal chemistry.

[36]  Igor V. Tetko,et al.  Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set , 2010, J. Chem. Inf. Model..

[37]  A. Cherkasov Inductive Descriptors: 10 Successful Years in QSAR , 2005 .

[38]  Gerhard Klebe,et al.  Comparison of Automatic Three-Dimensional Model Builders Using 639 X-ray Structures , 1994, J. Chem. Inf. Comput. Sci..

[39]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[40]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[41]  Igor V. Tetko,et al.  Application of a Pruning Algorithm To Optimize Artificial Neural Networks for Pharmaceutical Fingerprinting , 1998, J. Chem. Inf. Comput. Sci..

[42]  Igor V. Tetko,et al.  Neural network studies, 1. Comparison of overfitting and overtraining , 1995, J. Chem. Inf. Comput. Sci..

[43]  Igor V. Tetko,et al.  Modeling the Biodegradability of Chemical Compounds Using the Online CHEmical Modeling Environment (OCHEM) , 2013, Molecular informatics.

[44]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[45]  I. Tetko,et al.  Applicability domain for in silico models to achieve accuracy of experimental measurements , 2010 .

[46]  Igor I. Baskin,et al.  Chemical graphs and their basis invariants , 1999 .

[47]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[48]  David Vidal,et al.  LINGO, an Efficient Holographic Text Based Method To Calculate Biophysical Properties and Intermolecular Similarities , 2005, J. Chem. Inf. Model..