SAR and QSAR modeling of a large collection of LD50 rat acute oral toxicity data

The median lethal dose for rodent oral acute toxicity (LD50) is a standard piece of information required to categorize chemicals in terms of the potential hazard posed to human health after acute exposure. The exclusive use of in vivo testing is limited by the time and costs required for performing experiments and by the need to sacrifice a number of animals. (Quantitative) structure–activity relationships [(Q)SAR] proved a valid alternative to reduce and assist in vivo assays for assessing acute toxicological hazard. In the framework of a new international collaborative project, the NTP Interagency Center for the Evaluation of Alternative Toxicological Methods and the U.S. Environmental Protection Agency’s National Center for Computational Toxicology compiled a large database of rat acute oral LD50 data, with the aim of supporting the development of new computational models for predicting five regulatory relevant acute toxicity endpoints. In this article, a series of regression and classification computational models were developed by employing different statistical and knowledge-based methodologies. External validation was performed to demonstrate the real-life predictability of models. Integrated modeling was then applied to improve performance of single models. Statistical results confirmed the relevance of developed models in regulatory frameworks, and confirmed the effectiveness of integrated modeling. The best integrated strategies reached RMSEs lower than 0.50 and the best classification models reached balanced accuracies over 0.70 for multi-class and over 0.80 for binary endpoints. Computed predictions will be hosted on the EPA’s Chemistry Dashboard and made freely available to the scientific community.

[1]  M Balls,et al.  Why modification of the LD50 test will not be enough , 1991, Laboratory animals.

[2]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[3]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[4]  M. Pavan,et al.  The role of the European Chemicals Bureau in promoting the regulatory use of (Q)SAR methods , 2007, SAR and QSAR in environmental research.

[5]  Emilio Benfenati,et al.  Methodology of aiQSAR: a group-specific approach to QSAR modelling , 2019, Journal of Cheminformatics.

[6]  Ivanka Tsakovska,et al.  A Mini Review of Mammalian Toxicity (Q)SAR Models , 2008 .

[7]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[8]  Mark T. D. Cronin,et al.  QSAR in Toxicology. 3. Prediction of Chronic Toxicities , 1995 .

[9]  Robert P. Sheridan,et al.  Three Useful Dimensions for Domain Applicability in QSAR Models Using Random Forest , 2012, J. Chem. Inf. Model..

[10]  Werner Klein,et al.  Prediction of Mammalian Toxicity by Quantitative Structure Activity Relationships: Aliphatic Amines and Anilines , 1991 .

[11]  Nathan Brown,et al.  Multi-objective optimization methods in drug design. , 2013, Drug discovery today. Technologies.

[12]  J. Devillers Prediction of mammalian toxicity of organophosphorus pesticides from QSTR modeling , 2004, SAR and QSAR in environmental research.

[13]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[14]  Markus Schulz,et al.  Refinement and Reduction of Acute Oral Toxicity Testing: A Critical Review of the Use of Cytotoxicity Data , 2011, Alternatives to laboratory animals : ATLA.

[15]  E. Benfenati,et al.  Comparison of in silico tools for evaluating rat oral acute toxicity† , 2015, SAR and QSAR in environmental research.

[16]  E Benfenati,et al.  Predicting acute contact toxicity of pesticides in honeybees (Apis mellifera) through a k-nearest neighbor model. , 2017, Chemosphere.

[17]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[18]  Igor V. Tetko,et al.  Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis , 2008, J. Chem. Inf. Model..

[19]  Patricia Ruiz,et al.  Prediction of Acute Mammalian Toxicity Using QSAR Methods: A Case Study of Sulfur Mustard and Its Breakdown Products , 2012, Molecules.

[20]  Davide Ballabio,et al.  Multivariate comparison of classification performance measures , 2017 .

[21]  Emilio Benfenati,et al.  An alternative QSAR-based approach for predicting the bioconcentration factor for regulatory purposes. , 2014, ALTEX.

[22]  Kamel Mansouri,et al.  Predictive Models for Acute Oral Systemic Toxicity: A Workshop to Bridge the Gap from Research to Regulation. , 2018, Computational toxicology.

[23]  Chao Chen,et al.  Using Random Forest to Learn Imbalanced Data , 2004 .

[24]  Scott D. Kahn,et al.  Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships , 2005, Alternatives to laboratory animals : ATLA.

[25]  Emilio Benfenati,et al.  Predicting persistence in the sediment compartment with a new automatic software based on the k-Nearest Neighbor (k-NN) algorithm. , 2016, Chemosphere.

[26]  Max Kuhn,et al.  The caret Package , 2007 .

[27]  E Walum,et al.  Acute oral toxicity. , 1998, Environmental health perspectives.

[28]  Alexander Tropsha,et al.  Quantitative structure-activity relationship modeling of rat acute toxicity by oral exposure. , 2009, Chemical research in toxicology.

[29]  E Benfenati,et al.  Automatic knowledge extraction from chemical structures: the case of mutagenicity prediction , 2013, SAR and QSAR in environmental research.

[30]  Roberto Todeschini,et al.  Integrated QSAR Models to Predict Acute Oral Systemic Toxicity , 2018, Molecular informatics.

[31]  J. Hengstler,et al.  The REACH concept and its impact on toxicological sciences. , 2006, Toxicology.

[32]  A. Furst Basic Toxicology (Fundamentals, Target Organs and Risk Assessment) , 1997 .

[33]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[34]  Emilio Benfenati,et al.  A K-nn Algorithm for Predicting the Oral Sub- Chronic Toxicity in the Rat , 2022 .

[35]  George C. Fonger,et al.  The National Library of Medicine's (NLM) Hazardous Substances Data Bank (HSDB): background, recent enhancements and future plans. , 2014, Toxicology.

[36]  Ann M Richard,et al.  Distributed structure-searchable toxicity (DSSTox) public database network: a proposal. , 2002, Mutation research.

[37]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[38]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[39]  Emilio Benfenati,et al.  QSAR Modeling of ToxCast Assays Relevant to the Molecular Initiating Events of AOPs Leading to Hepatic Steatosis , 2018, J. Chem. Inf. Model..

[40]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[41]  Utility of the QSAR modeling system for predicting the toxicity of substances on the European inventory of existing commercial chemicals , 1990 .

[42]  Milon Tichy,et al.  QSAR IN TOXICOLOGY , 1980 .

[43]  Angelo Carotti,et al.  Improving Quantitative Structure-Activity Relationships through Multiobjective Optimization , 2009, J. Chem. Inf. Model..

[44]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[45]  Sandra Coecke,et al.  Acutoxbase, an innovative database for in vitro acute toxicity studies. , 2009, Toxicology in vitro : an international journal published in association with BIBRA.

[46]  J. Devillers,et al.  Prediction of acute mammalian toxicity from QSARs and interspecies correlations , 2009, SAR and QSAR in environmental research.

[47]  Antje Wolf,et al.  Generalized Workflow for Generating Highly Predictive in Silico Off-Target Activity Models , 2014, J. Chem. Inf. Model..

[48]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[49]  Ivan Rusyn,et al.  Predicting drug-induced hepatotoxicity using QSAR and toxicogenomics approaches. , 2011, Chemical research in toxicology.

[50]  Kamel Mansouri,et al.  Prediction of Acute Oral Systemic Toxicity Using a Multifingerprint Similarity Approach , 2018, Toxicological sciences : an official journal of the Society of Toxicology.

[51]  Giuseppina C. Gini,et al.  Mining toxicity structural alerts from SMILES: A new way to derive Structure Activity Relationships , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[52]  Rajarshi Guha,et al.  Chemical Informatics Functionality in R , 2007 .

[53]  Emilio Benfenati,et al.  A generalizable definition of chemical similarity for read-across , 2014, Journal of Cheminformatics.

[54]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[55]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[56]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[57]  Emilio Benfenati,et al.  Integrating computational methods to predict mutagenicity of aromatic azo compounds , 2017, Journal of environmental science and health. Part C, Environmental carcinogenesis & ecotoxicology reviews.

[58]  M T D Cronin,et al.  The importance of hydrophobicity and electrophilicity descriptors in mechanistically-based QSARs for toxicological endpoints , 2002, SAR and QSAR in environmental research.

[59]  Victor Kuzmin,et al.  Application of Random Forest Approach to QSAR Prediction of Aquatic Toxicity , 2009, J. Chem. Inf. Model..

[60]  G. Kennedy,et al.  Acute toxicity in the rat following either oral or inhalation exposure. , 1991, Toxicology letters.

[61]  R. Saracci,et al.  Describing the validity of carcinogen screening tests. , 1979, British Journal of Cancer.

[62]  R. Didziapetris,et al.  Estimation of reliability of predictions and model applicability domain evaluation in the analysis of acute toxicity (LD 50) , 2010, SAR and QSAR in environmental research.