Generating accurate in silico predictions of acute aquatic toxicity for a range of organic chemicals: Towards similarity-based machine learning methods.

There has been an increase in the use of non-animal approaches, such as in silico and/or in vitro methods, for assessing the risks of hazardous chemicals. A number of machine learning algorithms link molecular descriptors that interpret chemical structural properties with their biological activity. These computer-aided methods encounter several challenges, the most significant being the heterogeneity of datasets; more efficient and inclusive computational methods that are able to process large and heterogeneous chemical datasets are needed. In this context, this study verifies the utility of similarity-based machine learning methods in predicting the acute aquatic toxicity of diverse organic chemicals on Daphnia magna and Oryzias latipes. Two similarity-based methods were tested that employ a limited training dataset, most similar to a given fitting point, instead of using the entire dataset that encompasses a wide range of chemicals. The kernel-weighted local polynomial approach had a number of advantages over the distance-weighted k-nearest neighbor (k-NN) algorithm. The results highlight the importance of lipophilicity, electrophilic reactivity, molecular polarizability, and size in determining acute toxicity. The rigorous model validation ensures that this approach is an important tool for estimating toxicity in new or untested chemicals.

[1]  Agnieszka Gajewicz,et al.  How to judge whether QSAR/read-across predictions can be trusted: a novel approach for establishing a model's applicability domain , 2018 .

[2]  Julie B. Zimmerman,et al.  Assessment of predictive models for estimating the acute aquatic toxicity of organic chemicals , 2016 .

[3]  Mark T D Cronin,et al.  Investigation of the Verhaar scheme for predicting acute aquatic toxicity: improving predictions obtained from Toxtree ver. 2.6. , 2015, Chemosphere.

[4]  M. Wand,et al.  An Effective Bandwidth Selector for Local Least Squares Regression , 1995 .

[5]  P. Marote,et al.  Acute aquatic toxicity of organic solvents modeled by QSARs , 2016, Journal of Molecular Modeling.

[6]  Donato Malerba,et al.  Classification of symbolic objects: A lazy learning approach , 2006, Intell. Data Anal..

[7]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[8]  James J. P. Stewart,et al.  Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parameters , 2012, Journal of Molecular Modeling.

[9]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[10]  Reenu,et al.  Exploring the role of quantum chemical descriptors in modeling acute toxicity of diverse chemicals to Daphnia magna. , 2015, Journal of molecular graphics & modelling.

[11]  R. LoPachin,et al.  Reactions of electrophiles with nucleophilic thiolate sites: relevance to pathophysiological mechanisms and remediation , 2016, Free radical research.

[12]  J. Sumpter,et al.  Learning from the past and considering the future of chemicals in the environment , 2020, Science.

[13]  Kunal Roy,et al.  QSAR modeling of toxicity of diverse organic chemicals to Daphnia magna using 2D and 3D descriptors. , 2010, Journal of hazardous materials.

[14]  Young Jun Kim,et al.  In Silico Modeling Method for Computational Aquatic Toxicology of Endocrine Disruptors: A Software-Based Approach Using QSAR Toolbox. , 2019, Journal of visualized experiments : JoVE.

[15]  S E Belanger,et al.  Mode of Action (MOA) Assignment Classifications for Ecotoxicology: An Evaluation of Approaches. , 2017, Environmental science & technology.

[16]  Artem Cherkasov,et al.  QSAR without borders. , 2020, Chemical Society reviews.

[17]  Vilma Edite Fonseca Heinzen,et al.  Estimating the Octanol/Water Partition Coefficient for Aliphatic Organic Compounds Using Semi-Empirical Electrotopological Index , 2011, International journal of molecular sciences.

[18]  R. LoPachin,et al.  Mechanisms of soft and hard electrophile toxicities. , 2019, Toxicology.

[19]  J. Bolton,et al.  Role of quinones in toxicology. , 2000, Chemical research in toxicology.

[20]  Alexander Tropsha,et al.  Novel Variable Selection Quantitative Structure-Property Relationship Approach Based on the k-Nearest-Neighbor Principle , 2000, J. Chem. Inf. Comput. Sci..

[21]  Igor V. Tetko,et al.  Virtual Computational Chemistry Laboratory – Design and Description , 2005, J. Comput. Aided Mol. Des..

[22]  P. V. Kerm,et al.  Adaptive kernel density estimation , 2003 .

[23]  Alexis J. Comber,et al.  Distance metric choice can both reduce and induce collinearity in geographically weighted regression , 2018, Environment and Planning B: Urban Analytics and City Science.

[24]  Tomasz Puzyn,et al.  Calculation of Quantum-Mechanical Descriptors for QSPR at the DFT Level: Is It Necessary? , 2008, J. Chem. Inf. Model..

[25]  D. Muir,et al.  Toward a Global Understanding of Chemical Pollution: A First Comprehensive Analysis of National and Regional Chemical Inventories. , 2020, Environmental science & technology.

[26]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[27]  Jianqing Fan,et al.  Adaptive Order Polynomial Fitting: Bandwidth Robustification and Bias Reduction , 1995 .

[28]  Jonathan D. Hirst,et al.  Nonparametric Regression Applied to Quantitative Structure-Activity Relationships , 2000, J. Chem. Inf. Comput. Sci..

[29]  T. Hayfield,et al.  The np Package , 2008 .

[30]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[31]  J. Devillers,et al.  A predictive structure-toxicity model with Daphnia magna , 1987 .

[32]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[33]  Jeffrey S. Racine,et al.  Nonparametric Econometrics: The np Package , 2008 .

[34]  Roberto Todeschini,et al.  Prediction of Acute Aquatic Toxicity toward Daphnia Magna by using the GA-kNN Method , 2014, Alternatives to laboratory animals : ATLA.

[35]  D. Coomans,et al.  Exploration of linear modelling techniques and their combination with multivariate adaptive regression splines to predict gastro-intestinal absorption of drugs. , 2007, Journal of pharmaceutical and biomedical analysis.

[36]  T M Martin,et al.  Comparison of global and mode of action-based models for aquatic toxicity , 2015, SAR and QSAR in environmental research.

[37]  W. William Hughes,et al.  Essentials Of Environmental Toxicology , 1996 .

[38]  M. Cronin,et al.  (Q)SARs to predict environmental toxicities: current status and future needs. , 2017, Environmental science. Processes & impacts.

[39]  Jianqing Fan,et al.  Local polynomial modelling and its applications , 1994 .

[40]  R Todeschini,et al.  A similarity-based QSAR model for predicting acute toxicity towards the fathead minnow (Pimephales promelas) , 2015, SAR and QSAR in environmental research.

[41]  A. DeCaprio,et al.  Application of the Hard and Soft, Acids and Bases (HSAB) theory to toxicant--target interactions. , 2012, Chemical research in toxicology.

[42]  Julie B. Zimmerman,et al.  Towards rational molecular design: derivation of property guidelines for reduced acute aquatic toxicity , 2011 .

[43]  Guohua Pan,et al.  Local Regression and Likelihood , 1999, Technometrics.

[44]  Noriyuki Suzuki,et al.  Aquatic toxicity (Pre)screening strategy for structurally diverse chemicals: global or local classification tree models? , 2021, Ecotoxicology and environmental safety.

[45]  Victor Kuzmin,et al.  Application of Random Forest Approach to QSAR Prediction of Aquatic Toxicity , 2009, J. Chem. Inf. Model..

[46]  Jonathan D Hirst,et al.  Application of non-parametric regression to quantitative structure-activity relationships. , 2002, Bioorganic & medicinal chemistry.

[47]  Alexander Golbraikh,et al.  A Novel Automated Lazy Learning QSAR (ALL-QSAR) Approach: Method Development, Applications, and Virtual Screening of Chemical Databases Using Validated ALL-QSAR Models , 2006, J. Chem. Inf. Model..

[48]  Jianqing Fan,et al.  Variable Bandwidth and Local Linear Regression Smoothers , 1992 .

[49]  Xinliang Yu,et al.  Machine learning-based prediction of toxicity of organic compounds towards fathead minnow , 2020, RSC advances.

[50]  山上 鋭享 Globally Harmonized System of Classification and Labelling of Chemicals (GHS) 化学品の分類および表示に関する世界調和システム , 2007 .

[51]  Yukun Wang,et al.  A joint optimization QSAR model of fathead minnow acute toxicity based on a radial basis function neural network and its consensus modeling , 2020, RSC advances.

[52]  Douglas M. Young,et al.  Prediction of Aquatic Toxicity Mode of Action Using Linear Discriminant and Random Forest Models , 2013, J. Chem. Inf. Model..

[53]  P. Geladi,et al.  QSAR models for predicting the acute toxicity of selected organic chemicals with diverse structures to aquatic non-vertebrates and humans. , 1994, SAR and QSAR in environmental research.

[54]  J Devillers,et al.  Environmental and health risks of hydroquinone. , 1990, Ecotoxicology and environmental safety.

[55]  Xiao-dong Wang,et al.  Acute toxicity of benzene derivatives to the tadpoles (Rana japonica) and QSAR analyses. , 2003, Chemosphere.

[56]  Mark T. D. Cronin,et al.  QSAR Analysis of the Toxicity of Aromatic Compounds to Chlorella vulgaris in a Novel Short-Term Assay , 2004, J. Chem. Inf. Model..

[57]  J. Hermens,et al.  Classifying environmental pollutants , 1992 .

[58]  C. Selassie,et al.  QSAR of toxicology of substituted phenols , 2015 .

[59]  Nina Nikolova-Jeliazkova,et al.  QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review , 2005, Alternatives to laboratory animals : ATLA.

[60]  Weihua Li,et al.  In silico prediction of chemical aquatic toxicity for marine crustaceans via machine learning. , 2019, Toxicology research.

[61]  J. Mu,et al.  A DFT-based toxicity QSAR study of aromatic hydrocarbons to Vibrio fischeri: Consideration of aqueous freely dissolved concentration. , 2016, Journal of hazardous materials.

[62]  Mark T. D. Cronin,et al.  Predicting Chemical Toxicity and Fate , 2004 .

[63]  Y. Wang,et al.  Using support vector regression coupled with the genetic algorithm for predicting acute toxicity to the fathead minnow , 2010, SAR and QSAR in environmental research.

[64]  Aamer Mehmood,et al.  Performance Evaluation of Various Functions for Kernel Density Estimation , 2013 .

[65]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[66]  J. Schwöbel,et al.  Measurement and estimation of electrophilic reactivity for predictive toxicology. , 2011, Chemical reviews.

[67]  A. Furuhama,et al.  Development of an ecotoxicity QSAR model for the KAshinhou Tool for Ecotoxicity (KATE) system, March 2009 version , 2010, SAR and QSAR in environmental research.

[68]  Paola Gramatica,et al.  Real External Predictivity of QSAR Models: How To Evaluate It? Comparison of Different Validation Criteria and Proposal of Using the Concordance Correlation Coefficient , 2011, J. Chem. Inf. Model..

[69]  Ping Li,et al.  Prediction of the acute toxicity of chemical compounds to the fathead minnow by machine learning approaches , 2010 .

[70]  Maurice W. Sabelis,et al.  Toxicity of methyl ketones from tomato trichomes to Tetranychus urticae Koch , 1997, Experimental & Applied Acarology.

[71]  Márcia M. C. Ferreira,et al.  Basic validation procedures for regression models in QSAR and QSPR studies: theory and application , 2009 .

[72]  Bart De Moor,et al.  Derivative estimation with local polynomial fitting , 2013, J. Mach. Learn. Res..

[73]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[74]  Agnieszka Gajewicz,et al.  What if the number of nanotoxicity data is too small for developing predictive Nano-QSAR models? An alternative read-across based approach for filling data gaps. , 2017, Nanoscale.

[75]  Włodzisław Duch,et al.  Similarity-based methods: a general framework for classification, approximation and association , 2000 .

[76]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[77]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[78]  Jerzy Leszczynski,et al.  The kernel-weighted local polynomial regression (KwLPR) approach: an efficient, novel tool for development of QSAR/QSAAR toxicity extrapolation models , 2021, Journal of Cheminformatics.