CHEMOMETRIC METHODS AND THEORETICAL MOLECULAR DESCRIPTORS IN PREDICTIVE QSAR MODELING OF THE ENVIRONMENTAL BEHAVIOR OF ORGANIC POLLUTANTS

This chapter surveys the QSAR modeling approaches (developed by the author’s research group) for the validated prediction of environmental properties of organic pollutants. Various chemometric methods, based on different theoretical molecular descriptors, have been applied: explorative techniques (such as PCA for ranking, SOM for similarity analysis), modeling approaches by multiple-linear regression (MLR, in particular OLS), and classification methods (mainly k-NN, CART, CP-ANN). The focus of this review is on the main topics of environmental chemistry and ecotoxicology, related to the physico-chemical properties, the reactivity, and biological activity of chemicals of high environmental concern. Thus, the review deals with atmospheric degradation reactions of VOCs by tropospheric oxidants, persistence and long-range transport of POPs, sorption behavior of pesticides (Koc and leaching), bioconcentration, toxicity (acute aquatic toxicity, mutagenicity of PAHs, estrogen binding activity for endocrine disruptors compounds (EDCs)), and finally persistent bioaccumulative and toxic (PBT) behavior for the screening and prioritization of organic pollutants. Common to all the proposed models is the attention paid to model validation for predictive ability (not only internal, but also external for chemicals not participating in the model development) and checking of the chemical domain of applicability. Adherence to such a policy, requested also by the OECD principles, ensures the production of reliable predicted data, useful also in the new European regulation of chemicals, REACH.

[1]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[2]  R. Benigni Structure-activity relationship studies of chemical mutagens and carcinogens: mechanistic investigations and prediction approaches. , 2005, Chemical reviews.

[3]  Emilio Benfenati,et al.  Classification of Potential Endocrine Disrupters on the Basis of Molecular Structure Using a Nonlinear Modeling Method , 2004, J. Chem. Inf. Model..

[4]  J. Zupan,et al.  Neural Networks in Chemistry , 1993 .

[5]  Han van de Waterbeemd,et al.  Chemometric methods in molecular design , 1995 .

[6]  Nikolai S. Zefirov,et al.  QSAR for Boiling Points of "Small" Sulfides. Are the "High-Quality Structure-Property-Activity Regressions" the Real High Quality QSAR Models? , 2001, J. Chem. Inf. Comput. Sci..

[7]  Lutgarde M. C. Buydens,et al.  Evolutionary optimisation : a tutorial , 1998 .

[8]  Paola Gramatica,et al.  3D‐modelling and Prediction by WHIM Descriptors. Part 6. Application of WHIM Descriptors in QSAR Studies , 1997 .

[9]  Paola Gramatica,et al.  3D-modelling and prediction by WHIM descriptors. Part 9. Chromatographic relative retention time and physico-chemical properties of polychlorinated biphenyls (PCBs) , 1998 .

[10]  Paola Gramatica,et al.  Statistically Validated QSARs, Based on Theoretical Descriptors, for Modeling Aquatic Toxicity of Organic Chemicals in Pimephales promelas (Fathead Minnow) , 2005, J. Chem. Inf. Model..

[11]  Hiren Patel,et al.  A Novel Index for the Description of Molecular Linearity , 2001, J. Chem. Inf. Comput. Sci..

[12]  Weida Tong,et al.  QSAR Models Using a Large Diverse Set of Estrogens , 2001, J. Chem. Inf. Comput. Sci..

[13]  Aleksandar Sabljic,et al.  Comparative QSAR study on hydroxyl radical reactivity with unsaturated hydrocarbons: PLS versus MLR , 1996 .

[14]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[15]  Dan C. Fara,et al.  General and Class Specific Models for Prediction of Soil Sorption Using Various Physicochemical Descriptors , 2002, J. Chem. Inf. Comput. Sci..

[16]  P. Gramatica,et al.  Externally validated QSPR modelling of VOC tropospheric oxidation by NO3 radicals , 2008, SAR and QSAR in environmental research.

[17]  T. I. Netzeva,et al.  Prediction of estrogenicity: validation of a classification model , 2006, SAR and QSAR in environmental research.

[18]  J. Edward Jackson,et al.  A User's Guide to Principal Components: Jackson/User's Guide to Principal Components , 2004 .

[19]  D. B. Hibbert Genetic algorithms in chemistry , 1993 .

[20]  C. Hansch,et al.  p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure , 1964 .

[21]  H. Kubinyi,et al.  3D QSAR in drug design. , 2002 .

[22]  Philip Howard,et al.  Practical considerations on the use of predictive models for regulatory purposes. , 2005, Environmental science & technology.

[23]  P Gramatica,et al.  Ranking of volatile organic compounds for tropospheric degradability by oxidants: A QSPR approach , 2002, SAR and QSAR in environmental research.

[24]  J. Devillers,et al.  SAR and QSAR modeling of endocrine disruptors , 2006, SAR and QSAR in environmental research.

[25]  Wen‐Jun Zhang,et al.  Comparison of different methods for variable selection , 2001 .

[26]  Jerome H. Friedman,et al.  Classification: Oldtimers and newcomers , 1989 .

[27]  P Gramatica,et al.  3D-modelling and prediction by WHIM descriptors. Part 8. Toxicity and physico-chemical properties of environmental priority chemicals by 2D-TI and 3D-WHIM descriptors. , 1997, SAR and QSAR in environmental research.

[28]  Roberto Todeschini,et al.  The K correlation index: theory development and its application in chemometrics , 1999 .

[29]  Desire L. Massart,et al.  Random correlation in variable selection for multivariate calibration with a genetic algorithm , 1996 .

[30]  Dan C. Fara,et al.  QSPR Treatment of the Soil Sorption Coefficients of Organic Pollutants , 2005, J. Chem. Inf. Model..

[31]  R Benigni,et al.  Quantitative structure-activity relationships of mutagenic and carcinogenic aromatic amines. , 2000, Chemical reviews.

[32]  P Gramatica,et al.  QSAR and chemometric approaches for setting water quality objectives for dangerous chemicals. , 2001, Ecotoxicology and environmental safety.

[33]  P. Gramatica,et al.  Modelling and prediction of soil sorption coefficients of non-ionic organic pesticides by molecular descriptors. , 2000, Chemosphere.

[34]  Alexander Golbraikh,et al.  Rational selection of training and test sets for the development of validated QSAR models , 2003, J. Comput. Aided Mol. Des..

[35]  Zbigniew Michalewicz,et al.  Evolutionary algorithms , 1997, Emerging Evolutionary Algorithms for Antennas and Wireless Communications.

[36]  J. Devillers Genetic algorithms in molecular modeling , 1996 .

[37]  Paola Gramatica,et al.  In silico screening of estrogen-like chemicals based on different nonlinear classification models. , 2007, Journal of molecular graphics & modelling.

[38]  Andreas Klamt,et al.  Estimation of gas-phase hydroxyl radical rate constants of oxygenated compounds based on molecular orbital calculations , 1996 .

[39]  Peter C. Jurs,et al.  ADAPT: A Computer System for Automated Data Analysis Using Pattern Recognition Techniques , 1976, J. Chem. Inf. Comput. Sci..

[40]  J. Corton,et al.  Interaction of Estrogenic Chemicals and Phytoestrogens with Estrogen Receptor β. , 1998, Endocrinology.

[41]  Aleksandar Sabljic,et al.  Predicting the night-time NO3 radical reactivity in the troposphere , 1990 .

[42]  David J. Livingstone,et al.  The Characterization of Chemical Structures Using Molecular Properties. A Survey , 2000, J. Chem. Inf. Comput. Sci..

[43]  I. Jolliffe Principal Component Analysis , 2002 .

[44]  Paola Gramatica,et al.  Weighted holistic invariant molecular descriptors. Part 2. Theory development and applications on modeling physicochemical properties of polyaromatic hydrocarbons , 1995 .

[45]  E. Hulzebos,et al.  (Q)SARS: gatekeepers against risk on chemicals? , 2003, SAR and QSAR in environmental research.

[46]  Arja Asikainen,et al.  Spectroscopic QSAR Methods and Self-Organizing Molecular Field Analysis for Relating Molecular Structure and Estrogenic Activity , 2003, J. Chem. Inf. Comput. Sci..

[47]  Paola Gramatica,et al.  QSAR Prediction of Ozone Tropospheric Degradation , 2003 .

[48]  Paola Gramatica,et al.  Modeling and prediction by using WHIM descriptors in QSAR studies: Toxicity of heterogeneous chemicals on Daphnia magna , 1996 .

[49]  J.C. Dearden,et al.  Improved prediction of fish bioconcentration factor of Hydrophobic Chemicals , 2004, SAR and QSAR in environmental research.

[50]  P. Gramatica,et al.  Classification of environmental pollutants for global mobility potential , 2002, SAR and QSAR in environmental research.

[51]  R. Todeschini,et al.  The WHIM Theory: New 3D-molecular descriptors for QSAR in environmental modelling , 1997 .

[52]  Weida Tong,et al.  Prediction of estrogen receptor binding for 58,000 chemicals using an integrated system of a tree-based model with structural alerts. , 2001, Environmental health perspectives.

[53]  J. Dearden,et al.  Linear QSAR regression models for the prediction of bioconcentration factors by physicochemical properties and structural theoretical molecular descriptors. , 2007, Chemosphere.

[54]  Paola Gramatica,et al.  QSAR prediction of estrogen activity for a large set of diverse chemicals under the guidance of OECD principles. , 2006, Chemical research in toxicology.

[55]  Paola Gramatica,et al.  Validated QSAR Prediction of OH Tropospheric Degradation of VOCs: Splitting into Training-Test Sets and Consensus Modeling , 2004, J. Chem. Inf. Model..

[56]  Kunal Roy,et al.  On Selection of Training and Test Sets for the Development of Predictive QSAR models , 2006 .

[57]  Nostrand Reinhold,et al.  the utility of using the genetic algorithm approach on the problem of Davis, L. (1991), Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York. , 1991 .

[58]  S. Tao,et al.  Estimation of bioconcentration factors of nonionic organic compounds in fish by molecular connectivity indices and polarity correction factors. , 2000, Chemosphere.

[59]  Paola Gramatica,et al.  Quantitative structure-activity relationship modeling of polycyclic aromatic hydrocarbon mutagenicity by classification methods based on holistic theoretical molecular descriptors. , 2007, Ecotoxicology and environmental safety.

[60]  W. A. Toscano,et al.  QSAR Models of the in vitro Estrogen Activity of Bisphenol A Analogs , 2003 .

[61]  M. Karelson Molecular descriptors in QSAR/QSPR , 2000 .

[62]  Marjan Veber,et al.  Prediction of rate constants for the reaction of O3 with different organic compounds , 2001 .

[63]  Paola Gramatica,et al.  3D-Modelling and Prediction by Whim Descriptors. Part 7. Physico-Chemical Properties of Haloaromatics: Comparison Between Whim and Topological Descriptors , 1997 .

[64]  Emilio Benfenati,et al.  A new hybrid system of QSAR models for predicting bioconcentration factors (BCF). , 2008, Chemosphere.

[65]  P Gramatica,et al.  Prediction of PAH mutagenicity in human cells by QSAR classification , 2008, SAR and QSAR in environmental research.

[66]  P Gramatica,et al.  Prediction of aromatic amines mutagenicity from theoretical molecular descriptors , 2003, SAR and QSAR in environmental research.

[67]  Paola Gramatica,et al.  SD-modelling and Prediction by WHIM Descriptors. Part 5. Theory Development and Chemical Meaning of WHIM Descriptors , 1997 .

[68]  R. Leardi Application of a genetic algorithm to feature selection under full validation conditions and to outlier detection , 1994 .

[69]  Paola Gramatica,et al.  Development, Validation and Inspection of the Applicability Domain of QSPR Models for Physicochemical Properties of Polybrominated Diphenyl Ethers , 2009 .

[70]  Roberto Todeschini,et al.  Structure/Response Correlations and Similarity/Diversity Analysis by GETAWAY Descriptors, 1. Theory of the Novel 3D Molecular Descriptors , 2002, J. Chem. Inf. Comput. Sci..

[71]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[72]  Scott D. Kahn,et al.  Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships , 2005, Alternatives to laboratory animals : ATLA.

[73]  Marina Lasagni,et al.  New molecular descriptors for 2D and 3D structures. Theory , 1994 .

[74]  T. Wayne Schultz,et al.  Molecular Quantum Similarity Analysis of Estrogenic Activity , 2003, J. Chem. Inf. Comput. Sci..

[75]  H. Güsten,et al.  Predicting the abiotic degradability of organic pollutants in the troposphere , 1999 .

[76]  Jure Zupan,et al.  Kohonen and counterpropagation artificial neural networks in analytical chemistry , 1997 .

[77]  Mikko Kolehmainen,et al.  Structure-based classification of active and inactive estrogenic compounds by decision tree, LVQ and kNN methods. , 2006, Chemosphere.

[78]  P. Gramatica,et al.  Screening the leaching tendency of pesticides applied in the Amu Darya Basin (Uzbekistan). , 2004, Water research.

[79]  J. Devillers,et al.  Genetic Algorithms in Computer-Aided Molecular Design , 1996 .

[80]  E Benfenati,et al.  Binary classification models for endocrine disrupter effects mediated through the estrogen receptor , 2008, SAR and QSAR in environmental research.

[81]  P. Gramatica,et al.  Ranking of aquatic toxicity of esters modelled by QSAR. , 2005, Chemosphere.

[82]  S Dimitrov,et al.  Base-line model for identifying the bioaccumulation potential of chemicals , 2005, SAR and QSAR in environmental research.

[83]  Hugo Kubinyi,et al.  Evolutionary variable selection in regression and PLS analyses , 1996 .

[84]  R. Todeschini,et al.  QSAR approach for the selection of congeneric compounds with a similar toxicological mode of action. , 2001, Chemosphere.

[85]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[86]  R. Atkinson A structure-activity relationship for the estimation of rate constants for the gas-phase reactions of OH radicals with organic compounds , 1987 .

[87]  Mark T D Cronin,et al.  Structure-toxicity relationships for aliphatic chemicals evaluated with Tetrahymena pyriformis. , 2002, Chemical research in toxicology.

[88]  Anton J. Hopfinger,et al.  Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[89]  T. Öberg A QSAR for the hydroxyl radical reaction rate constant: validation, domain of application, and prediction , 2005 .

[90]  Weida Tong,et al.  Regulatory application of SAR/QSAR for priority setting of endocrine disruptors: A perspective , 2003 .

[91]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[92]  A. Balaban,et al.  Topological Indices and Related Descriptors in QSAR and QSPR , 2003 .

[93]  Paola Gramatica,et al.  The applications of machine learning algorithms in the modeling of estrogen-like chemicals. , 2009, Combinatorial chemistry & high throughput screening.

[94]  Paola Gramatica,et al.  An Update of the BCF QSAR Model Based on Theoretical Molecular Descriptors , 2005 .

[95]  P. Gramatica WHIM Descriptors of Shape , 2006 .

[96]  Paola Gramatica,et al.  Structure/Response Correlations and Similarity/Diversity Analysis by GETAWAY Descriptors, 2. Application of the Novel 3D Molecular Descriptors to QSAR/QSPR Studies , 2002, J. Chem. Inf. Comput. Sci..

[97]  Weida Tong,et al.  Structure‐activity relationship approaches and applications , 2003, Environmental toxicology and chemistry.

[98]  E. Papa,et al.  Approaches for externally validated QSAR modelling of Nitrated Polycyclic Aromatic Hydrocarbon mutagenicity , 2007, SAR and QSAR in environmental research.

[99]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[100]  Paola Gramatica,et al.  Classification of organic solvents and modelling of their physico-chemical properties by chemometric methods using different sets of molecular descriptors , 1999 .

[101]  Henk J. M. Verhaar,et al.  QSAR modelling of soil sorption. Improvements and systematics of log KOC vs. log KOW correlations , 1995 .

[102]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[103]  Paola Gramatica,et al.  QSAR Modeling of Bioconcentration Factor by theoretical molecular descriptors , 2003 .

[104]  P. Gramatica,et al.  Ranking and classification of non-ionic organic pesticides for environmental distribution: a qsar approach , 2004 .

[105]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[106]  Jarmo Huuskonen,et al.  Prediction of Soil Sorption Coefficient of a Diverse Set of Organic Chemicals From Molecular Structure , 2003, J. Chem. Inf. Comput. Sci..

[107]  Paola Gramatica,et al.  Screening and ranking of POPs for global half-life: QSAR approaches for prioritization based on molecular structure. , 2007, Environmental science & technology.

[108]  Rajarshi Guha,et al.  Generation of QSAR sets with a self-organizing map. , 2004, Journal of molecular graphics & modelling.

[109]  E. Benfenati,et al.  In silico-aided prediction of biological properties of chemicals: oestrogen receptor-mediated effects. , 2008, Chemical Society reviews.

[110]  J. Huuskonen,et al.  Prediction of soil sorption coefficient of organic pesticides from the atom‐type electrotopological state indices , 2003, Environmental toxicology and chemistry.

[111]  Paola Gramatica,et al.  Statistical external validation and consensus modeling: a QSPR case study for Koc prediction. , 2007, Journal of molecular graphics & modelling.

[112]  Paola Gramatica,et al.  QSAR study on the tropospheric degradation of organic compounds , 1999 .

[113]  Paola Gramatica,et al.  A tool for the assessment of VOC degradability by tropospheric oxidants starting from chemical structure , 2004 .

[114]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[115]  Gonzalo A. Jaña,et al.  A Simple QSPR Model for Predicting Soil Sorption Coefficients of Polar and Nonpolar Organic Compounds from Molecular Formula , 2003, J. Chem. Inf. Comput. Sci..

[116]  H. Kubinyi Variable Selection in QSAR Studies. II. A Highly Efficient Combination of Systematic Search and Evolution , 1994 .

[117]  Gilman D. Veith,et al.  A QSAR Approach for Estimating the Aquatic Toxicity of Soft Electrophiles [QSAR for Soft Electrophiles] , 1993 .

[118]  Paola Gramatica,et al.  Screening of persistent organic pollutants by QSPR classification models: a comparative study. , 2008, Journal of molecular graphics & modelling.

[119]  Paola Gramatica,et al.  Evaluation and QSAR modeling on multiple endpoints of estrogen activity based on different bioassays. , 2008, Chemosphere.

[120]  Robert Hecht-Nielsen,et al.  Applications of counterpropagation networks , 1988, Neural Networks.

[121]  Peter C. Jurs,et al.  Prediction of Hydroxyl Radical Rate Constants from Molecular Structure , 1999, J. Chem. Inf. Comput. Sci..

[122]  W. Doucette Quantitative structure‐activity relationships for predicting soil‐sediment sorption coefficients for organic chemicals , 2003, Environmental toxicology and chemistry.

[123]  J. Devillers,et al.  Comparison of BCF models based on log P , 1996 .

[124]  Alexander Golbraikh,et al.  Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection , 2002, J. Comput. Aided Mol. Des..

[125]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[126]  Paola Gramatica,et al.  Screening of pesticides for environmental partitioning tendency. , 2002, Chemosphere.

[127]  S. Weisberg Plots, transformations, and regression , 1985 .

[128]  B. M. Gawlik,et al.  Alternatives for the determination of the soil adsorption coefficient, Koc, of non-ionicorganic compounds : A review , 1997 .

[129]  W. Klein,et al.  Estimating atmospheric degradation processes by SARs , 1991 .

[130]  Robert S. Boethling,et al.  Improved method for estimating bioconcentration/bioaccumulation factor from octanol/water partition coefficient , 1999 .

[131]  P Gramatica,et al.  QSAR approach to POPs screening for atmospheric persistence. , 2001, Chemosphere.

[132]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[133]  Shu Tao,et al.  Estimation of Organic Carbon Normalized Sorption Coefficient (KOC) for Soils Using the Fragment Constant Method , 1999 .

[134]  Roberto Todeschini,et al.  A new algorithm for optimal, distance based, experimental design , 1992 .

[135]  N. Nikolova,et al.  International Union of Pure and Applied Chemistry, LUMO energy ± The Lowest Unoccupied Molecular Orbital (LUMO) , 2022 .

[136]  Alexandru T Balaban,et al.  Using variable and fixed topological indices for the prediction of reaction rate constants of volatile unsaturated hydrocarbons with OH radicals. , 2004, Molecules.

[137]  Paola Gramatica,et al.  Predicting the NO3 radical tropospheric degradability of organic pollutants by theoretical molecular descriptors , 2003 .

[138]  Philip H Howard,et al.  A review of quantitative structure‐activity relationship methods for the prediction of atmospheric oxidation of organic chemicals , 2003, Environmental toxicology and chemistry.

[139]  Hxugo Kubiny Variable Selection in QSAR Studies. I. An Evolutionary Algorithm , 1994 .

[140]  William J. Welsh,et al.  QSAR models in receptor-mediated effects: the nuclear receptor superfamily , 2003 .