QSAR Modeling is not “Push a Button and Find a Correlation”: A Case Study of Toxicity of (Benzo‐)triazoles on Algae

A case study of toxicity of (benzo)triazoles ((B)TAZs) to the algae Pseudokirchneriella subcapitata is used to discuss some problems and solutions in QSAR modeling, particularly in the environmental context. The relevance of data curation (not only of experimental data, but also of chemical structures and input formats for the calculation of molecular descriptors), the crucial points of QSAR model validation and the potential application for new chemicals (internal robustness, exclusion of chance correlation, external predictivity, applicability domain) are described, while developing MLR‐OLS models based on molecular descriptors, calculated by various QSAR software tools (commercial DRAGON, free PaDEL‐Descriptor and QSPR‐THESAURUS). Additionally, the utility of consensus models is highlighted. This work summarizes a methodology for a rigorous statistical approach to obtain reliable QSAR predictions, also for a large number of (B)TAZs in the ECHA preregistration list of REACH (even if starting from limited experimental data availability), and has evidenced some ambiguities and discrepancies related to SMILES notations from different databases; furthermore it highlighted some general problems related to QSAR model generation and was useful in the implementation of the PaDEL‐Descriptor software.

[1]  Igor V. Tetko,et al.  Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis , 2008, J. Chem. Inf. Model..

[2]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[3]  Rebecca Renner,et al.  The Kow controversy. , 2002, Environmental science & technology.

[4]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[5]  Paola Gramatica,et al.  QSAR model reproducibility and applicability: A case study of rate constants of hydroxyl radical reaction models applied to polybrominated diphenyl ethers and (benzo‐)triazoles , 2011, J. Comput. Chem..

[6]  J C Dearden,et al.  The components of the "critical quartet" log K ow values assessed by four commercial software packages , 2002, SAR and QSAR in environmental research.

[7]  Renate Sturm,et al.  Occurrence, distribution and fluxes of benzotriazoles along the German large river basins into the North Sea. , 2011, Water research.

[8]  Paola Gramatica,et al.  An Update of the BCF QSAR Model Based on Theoretical Molecular Descriptors , 2005 .

[9]  Roberto Todeschini,et al.  The K correlation index: theory development and its application in chemometrics , 1999 .

[10]  P Gramatica,et al.  Modelling physico-chemical properties of (benzo)triazoles, and screening for environmental partitioning. , 2011, Water research.

[11]  Paola Gramatica,et al.  Validated QSAR Prediction of OH Tropospheric Degradation of VOCs: Splitting into Training-Test Sets and Consensus Modeling , 2004, J. Chem. Inf. Model..

[12]  Ralph Kühne,et al.  External Validation and Prediction Employing the Predictive Squared Correlation Coefficient Test Set Activity Mean vs Training Set Activity Mean , 2008, J. Chem. Inf. Model..

[13]  Paola Gramatica,et al.  QSARINS-Software for QSAR MLR model development and validation , 2012 .

[14]  J. Devillers,et al.  Evaluation of the OECD QSAR Application Toolbox and Toxtree for estimating the mutagenicity of chemicals. Part 1. Aromatic amines , 2010, SAR and QSAR in environmental research.

[15]  Emilio Benfenati,et al.  Regulatory perspectives in the use and validation of QSAR. A case study: DEMETRA model for Daphnia toxicity. , 2008, Environmental science & technology.

[16]  John D. Walker,et al.  Quantitative structure–activity relationships (QSARs) in toxicology: a historical perspective , 2003 .

[17]  Paola Gramatica,et al.  QSPR as a support for the EU REACH regulation and rational design of environmentally safer chemicals: PBT identification from molecular structure , 2010 .

[18]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[19]  D. Young,et al.  Are the Chemical Structures in Your QSAR Correct , 2008 .

[20]  Igor I. Baskin,et al.  Chemical graphs and their basis invariants , 1999 .

[21]  John C. Dearden,et al.  A NOTE OF CAUTION TO USERS OF ECOSAR , 1999 .

[22]  Paola Gramatica,et al.  QSAR prediction of estrogen activity for a large set of diverse chemicals under the guidance of OECD principles. , 2006, Chemical research in toxicology.

[23]  Paola Gramatica,et al.  Statistically Validated QSARs, Based on Theoretical Descriptors, for Modeling Aquatic Toxicity of Organic Chemicals in Pimephales promelas (Fathead Minnow) , 2005, J. Chem. Inf. Model..

[24]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[25]  Paola Gramatica,et al.  Statistical external validation and consensus modeling: a QSPR case study for Koc prediction. , 2007, Journal of molecular graphics & modelling.

[26]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[27]  J Devillers,et al.  Evaluation of the OECD (Q)SAR Application Toolbox and Toxtree for predicting and profiling the carcinogenic potential of chemicals , 2010, SAR and QSAR in environmental research.

[28]  Paola Gramatica,et al.  Real External Predictivity of QSAR Models. Part 2. New Intercomparable Thresholds for Different Validation Criteria and the Need for Scatter Plot Inspection , 2012, J. Chem. Inf. Model..

[29]  Devon A. Cancilla,et al.  Detection of Aircraft Deicing/Antiicing Fluid Additives in a Perched Water Monitoring Well at an International Airport , 1998 .

[30]  Svetoslav H. Slavov,et al.  Estimating the toxicities of organic chemicals in activated sludge process. , 2010, Water research.

[31]  Timothy Clark,et al.  Conformation-Dependent QSPR Models: logPOW , 2011, J. Chem. Inf. Model..

[32]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[33]  Thomas Knacker,et al.  ECOSAR model performance with a large test set of industrial chemicals. , 2008, Chemosphere.

[34]  Alexander Tropsha,et al.  Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research , 2010, J. Chem. Inf. Model..

[35]  Roberto Todeschini,et al.  Comments on the Definition of the Q2 Parameter for QSAR Validation , 2009, J. Chem. Inf. Model..

[36]  Alexander Golbraikh,et al.  Rational selection of training and test sets for the development of validated QSAR models , 2003, J. Comput. Aided Mol. Des..

[37]  J. Zupan,et al.  Neural Networks in Chemistry , 1993 .

[38]  Paola Gramatica,et al.  Screening of pesticides for environmental partitioning tendency. , 2002, Chemosphere.

[39]  Paola Gramatica,et al.  Are Mechanistic and Statistical QSAR Approaches Really Different? MLR Studies on 158 Cycloalkyl‐Pyranones , 2010, Molecular informatics.

[40]  Paola Gramatica,et al.  Introduction General Considerations , 2022 .

[41]  Weida Tong,et al.  QSAR Models Using a Large Diverse Set of Estrogens , 2001, J. Chem. Inf. Comput. Sci..

[42]  Paola Gramatica,et al.  The importance of molecular structures, endpoints’ values, and predictivity parameters in QSAR research: QSAR analysis of a series of estrogen receptor binders , 2010, Molecular Diversity.

[43]  A. Kahru,et al.  Toxicity of 58 substituted anilines and phenols to algae Pseudokirchneriella subcapitata and bacteria Vibrio fischeri: comparison with published data and QSARs. , 2011, Chemosphere.

[44]  D. Horvath,et al.  ISIDA Property‐Labelled Fragment Descriptors , 2010, Molecular informatics.

[45]  Arthur M. Doweyko,et al.  QSAR: dead or alive? , 2008, J. Comput. Aided Mol. Des..

[46]  Paola Gramatica,et al.  Real External Predictivity of QSAR Models: How To Evaluate It? Comparison of Different Validation Criteria and Proposal of Using the Concordance Correlation Coefficient , 2011, J. Chem. Inf. Model..

[47]  T. Puzyn,et al.  Investigating the influence of data splitting on the predictive ability of QSAR/QSPR models , 2011 .

[48]  Stephen R. Johnson,et al.  The Trouble with QSAR (or How I Learned To Stop Worrying and Embrace Fallacy) , 2008, J. Chem. Inf. Model..

[49]  M F Wiser Drug design strategies. , 2001, IDrugs : the investigational drugs journal.