Estimating sensory irritation potency of volatile organic chemicals using QSARs based on decision tree methods for regulatory purpose

Volatile organic compounds (VOCs) are among the priority atmospheric pollutants that have high indoor and outdoor exposure potential. The toxicity assessment of VOCs to living ecosystems has received considerable attention in recent years. Development of computational methods for safety assessment of chemicals has been advocated by various regulatory agencies. The paper proposes robust and reliable quantitative structure–activity relationships (QSARs) for estimating the sensory irritation potency and screening of the VOCs. Here, decision tree (DT) based classification and regression QSARs models, such as single DT, decision tree forest (DTF), and decision tree boost (DTB) were developed using the sensory irritation data on VOCs in mice following the OECD principles. Structural diversity and nonlinearity in the data were evaluated through the Euclidean distance and Brock–Dechert–Scheinkman statistics. The constructed QSAR models were validated with external test data and the predictive performance of these models was established through a set of coefficients recommended in QSAR literature. The performance of all three classification and regression QSAR models was satisfactory, but DTF and DTB performed relatively better. The classification and regression QSAR models (DTF, DTB) rendered classification accuracies of 98.59 and 100 %, and yielded correlations (R2) of 0.950 and 0.971, respectively in complete data. The lipoaffinity index and SwHBa were identified as the most influential descriptors in proposed QSARs. The developed QSARs performed better than the previous studies. The developed models exhibited high statistical confidence and identified the structural properties of the VOCs responsible for their sensory irritation, and hence could be useful tools in screening of chemicals for regulatory purpose.

[1]  Shikha Gupta,et al.  In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches. , 2014, Toxicology and applied pharmacology.

[2]  B. Fan,et al.  Molecular similarity and diversity in chemoinformatics: From theory to applications , 2006, Molecular Diversity.

[3]  John Ferguson,et al.  The Use of Chemical Potentials as Indices of Toxicity , 1939 .

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  George Kollias,et al.  Ligand-based virtual screening procedure for the prediction and the identification of novel β-amyloid aggregation inhibitors using Kohonen maps and Counterpropagation Artificial Neural Networks. , 2011, European journal of medicinal chemistry.

[6]  Emmanuel Anoruo,et al.  Testing for Linear and Nonlinear Causality between Crude Oil Price Changes and Stock Market Returns , 2012 .

[7]  Lemont B. Kier,et al.  Determination of Topological Equivalence in Molecular Graphs from the Topological State , 1990 .

[8]  L. Lin Assay Validation Using the Concordance Correlation Coefficient , 1992 .

[9]  W S Cain,et al.  Sensory reactions of nasal pungency and odor to volatile organic compounds: the alkylbenzenes. , 1994, American Industrial Hygiene Association journal.

[10]  Shikha Gupta,et al.  Nano-QSAR modeling for predicting biological activity of diverse nanomaterials , 2014 .

[11]  Jui-Sheng Chou,et al.  Optimizing the Prediction Accuracy of Concrete Compressive Strength Based on a Comparison of Data-Mining Techniques , 2011, J. Comput. Civ. Eng..

[12]  Maykel Pérez González,et al.  Applications of 2D descriptors in drug design: a DRAGON tale. , 2008, Current topics in medicinal chemistry.

[13]  P. J. Taylor,et al.  Hydrogen Bonding 12. A New QSAR for Upper Respiratory Tract Irritation by Airborne Chemicals in Mice , 1990 .

[14]  Jerzy Leszczynski,et al.  Using nano-QSAR to predict the cytotoxicity of metal oxide nanoparticles. , 2011, Nature nanotechnology.

[15]  Chun-Xia Zhang,et al.  An empirical study of using Rotation Forest to improve regressors , 2008, Appl. Math. Comput..

[16]  M. Fatemi,et al.  Quantitative and qualitative prediction of corneal permeability for drug-like compounds. , 2011, Talanta.

[17]  M. Abraham,et al.  ORGAN TOXICITY AND MECHANISMS , 1998 .

[18]  Dinesh Mohan,et al.  Evaluating influences of seasonal variations and anthropogenic activities on alluvial groundwater hydrochemistry using ensemble learning approaches , 2014 .

[19]  Bin Wang,et al.  An In Silico Method for Screening Nicotine Derivatives as Cytochrome P450 2A6 Selective Inhibitors Based on Kernel Partial Least Squares , 2007, International Journal of Molecular Sciences.

[20]  Hsi-Hsien Yang,et al.  Volatile Organic Compounds and Nonspecific Conjunctivitis: A Population-Based Study , 2013 .

[21]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[22]  W. Tong,et al.  QSAR Models Using a Large Diverse Set of Estrogens. , 2001 .

[23]  Dinesh Mohan,et al.  QSTR modeling for qualitative and quantitative toxicity predictions of diverse chemical pesticides in honey bee for regulatory purposes. , 2014, Chemical research in toxicology.

[24]  Zhide Hu,et al.  Quantitative structure-activity relationship models for prediction of sensory irritants (logRD50) of volatile organic chemicals. , 2006, Chemosphere.

[25]  K. Roy,et al.  Development and validation of regression-based QSAR models for quantification of contributions of molecular fragments to skin sensitization potency of diverse organic chemicals , 2013, SAR and QSAR in environmental research.

[26]  M. Fatemi,et al.  Cytotoxicity estimation of ionic liquids based on their effective structural features. , 2011, Chemosphere.

[27]  M. Maéno,et al.  A Novel Approach for a Toxicity Prediction Model of Environmental Pollutants by Using a Quantitative Structure-Activity Relationship Method Based on Toxicogenomics , 2011, ISRN toxicology.

[28]  R. Saracci,et al.  Describing the validity of carcinogen screening tests. , 1979, British Journal of Cancer.

[29]  M. Schaper,et al.  Development of a database for sensory irritants and its use in establishing occupational exposure limits. , 1993, American Industrial Hygiene Association journal.

[30]  Premanjali Rai,et al.  Predicting carcinogenicity of diverse chemicals using probabilistic neural network modeling approaches. , 2013, Toxicology and applied pharmacology.

[31]  Rafael Pino-Mejías,et al.  Reduced bootstrap aggregating of learning algorithms , 2008, Pattern Recognit. Lett..

[32]  S. Grunwald,et al.  Tree-based modeling of complex interactions of phosphorus loadings and environmental factors. , 2009, The Science of the total environment.

[33]  K. Roy,et al.  On Two Novel Parameters for Validation of Predictive QSAR Models , 2009, Molecules.

[34]  T. Hancock,et al.  A performance comparison of modern statistical techniques for molecular descriptor selection and retention prediction in chromatographic QSRR studies , 2005 .

[35]  J. Friedman Stochastic gradient boosting , 2002 .

[36]  Alexander Golbraikh,et al.  Development of kNN QSAR Models for 3-Arylisoquinoline Antitumor Agents , 2011 .

[37]  Ralph Kühne,et al.  External Validation and Prediction Employing the Predictive Squared Correlation Coefficient Test Set Activity Mean vs Training Set Activity Mean , 2008, J. Chem. Inf. Model..

[38]  N. Coops,et al.  Modeling the occurrence of 15 coniferous tree species throughout the Pacific Northwest of North America using a hybrid approach of a generic process‐based growth model and decision tree analysis , 2011 .

[39]  Emilio Benfenati,et al.  The Expanding Role of Predictive Toxicology: An Update on the (Q)SAR Models for Mutagens and Carcinogens , 2007, Journal of environmental science and health. Part C, Environmental carcinogenesis & ecotoxicology reviews.

[40]  Paola Gramatica,et al.  Introduction General Considerations , 2022 .

[41]  Roberto Todeschini,et al.  Comments on the Definition of the Q2 Parameter for QSAR Validation , 2009, J. Chem. Inf. Model..

[42]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[43]  Halil Ibrahim Erdal,et al.  Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms , 2013 .

[44]  Ruifeng Liu,et al.  Development of Quantitative Structure-Property Relationship Models for Early ADME Evaluation in Drug Discovery. 2. Blood-Brain Barrier Penetration , 2001, J. Chem. Inf. Comput. Sci..

[45]  Hadi Valizadeh,et al.  THE RELATION BETWEEN MOLECULAR PROPERTIES OF DRUGS AND THEIR TRANSPORT ACROSS THE INTESTINAL MEMBRANE , 2006 .

[46]  Dinesh Mohan,et al.  Multispecies QSAR modeling for predicting the aquatic toxicity of diverse organic chemicals for regulatory toxicology. , 2014, Chemical research in toxicology.

[47]  J. Coutinho,et al.  Designing ionic liquids: the chemical structure role in the toxicity , 2012, Ecotoxicology.

[48]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[49]  B. LeBaron,et al.  A test for independence based on the correlation dimension , 1996 .

[50]  Shikha Gupta,et al.  Predicting acute aquatic toxicity of structurally diverse chemicals in fish using artificial intelligence approaches. , 2013, Ecotoxicology and environmental safety.

[51]  M. Abraham,et al.  Physicochemical properties of nonreactive volatile organic chemicals to estimate RD50: alternatives to animal studies. , 1995, Toxicology and applied pharmacology.

[52]  L. Sahu Volatile organic compounds and their measurements in the troposphere , 2012 .

[53]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..