High-accuracy QSAR models of narcosis toxicities of phenols based on various data partition, descriptor selection and modelling methods

The environmental protection agency thinks that quantitative structure–activity relationship (QSAR) analysis can better replace toxicity tests. In this paper, we developed QSAR methods to evaluate the narcosis toxicities of 50 phenol analogues. We first built multiple linear regression (MLR), stepwise multiple linear regression (SLR) and support vector regression (SVR) models using five descriptors and three different partitions, and the optimal SVR models with all three training-test partitions had the highest external prediction ability, about 10% higher than the models in the literature. Second, to identify more effective descriptors, we applied two in-house methods to select descriptors with clear meanings from 1264 descriptors calculated by the PCLIENT software and used them to construct the MLR, SLR and SVR models. Our results showed that our best SVR model (Rpred2 = 0.972) significantly increased 16.55% on the test set, and the appropriate partition presented the better stability. The different partitions of the training-test datasets also supported the excellent predictive power of the best SVR model. We further evaluated the regression significance of our SVR model and the importance of each single descriptor of the model according to the interpretability analysis. Our work provided a valuable exploration of different combinations among data partition, descriptor selection and model and a useful theoretical understanding of the toxicity of phenol analogues, especially for such a small dataset.

[1]  C. Gallert,et al.  Biodegradation of high phenol containing synthetic wastewater by an aerobic fixed bed reactor. , 2008, Bioresource technology.

[2]  H. Jenssen,et al.  Quantitative Structure–Activity Relationships and Docking Studies of Calcitonin Gene‐Related Peptide Antagonists , 2012, Chemical biology & drug design.

[3]  Dai Zhi-Jun,et al.  A Novel Method of Nonlinear Rapid Feature Selection for High Dimensional Data and Its Application in Peptide QSAR Modeling Based on Support Vector Machine , 2011 .

[4]  Kazuya Watanabe,et al.  Responses of activated sludge to an increase in phenol loading , 1996 .

[5]  Alexander Tropsha,et al.  Antitumor Agents 252. Application of validated QSAR models to database mining: discovery of novel tylophorine derivatives as potential anticancer agents , 2007, J. Comput. Aided Mol. Des..

[6]  Szymon Ulenberg,et al.  Novel 3-Amino-6-chloro-7-(azol-2 or 5-yl)-1,1-dioxo-1,4,2-benzodithiazine Derivatives with Anticancer Activity: Synthesis and QSAR Study , 2015, Molecules.

[7]  W. Zhou,et al.  Nonlinear QSAR models with high-dimensional descriptor selection and SVR improve toxicity prediction and evaluation of phenols on Photobacterium phosphoreum , 2015 .

[8]  Yun Tang,et al.  Identifying the structural features and diversifying the chemical domain of peripherally acting CB1 receptor antagonists using molecular modeling techniques , 2016 .

[9]  Viney Lather,et al.  Diverse classification models for anti-hepatitis C virus activity of thiourea derivatives , 2015 .

[10]  Kenneth M Y Leung,et al.  Progress of environmental management and risk assessment of industrial chemicals in China. , 2012, Environmental pollution.

[11]  G. Casañola-Martín,et al.  Prediction of acute toxicity of phenol derivatives using multiple linear regression approach for Tetrahymena pyriformis contaminant identification in a median-size database. , 2016, Chemosphere.

[12]  Handan Ucun,et al.  Phenol biodegradation in a batch jet loop bioreactor (JLB): kinetics study and pH variation. , 2010, Bioresource technology.

[14]  Li Ji,et al.  Back-propagation network improved by conjugate gradient based on genetic algorithm in QSAR study on endocrine disrupting chemicals , 2008 .

[15]  Yong Pan,et al.  Nano-QSAR modeling for predicting the cytotoxicity of metal oxide nanoparticles using novel descriptors , 2016 .

[16]  R. Khan,et al.  Predicting Odor Pleasantness from Odorant Structure: Pleasantness as a Reflection of the Physical World , 2007, The Journal of Neuroscience.

[17]  L. Coelho,et al.  Application of quantitative structure-property relationship analysis to estimate the vapor pressure of pesticides. , 2016, Ecotoxicology and environmental safety.

[18]  Maykel Cruz-Monteagudo,et al.  Prioritizing Hits with Appropriate Trade‐Offs Between HIV‐1 Reverse Transcriptase Inhibitory Efficacy and MT4 Blood Cells Toxicity Through Desirability‐Based Multiobjective Optimization and Ranking , 2010, Molecular informatics.

[19]  Jahan B. Ghasemi,et al.  An in silico screening study and design of potent cognition agents , 2015, J. Comput. Sci..

[20]  M. A. Islam,et al.  Molecular Modeling on Structure-Function Analysis of Human Progesterone Receptor Modulators , 2011, Scientia pharmaceutica.

[21]  A. Kahru,et al.  Toxicity of 58 substituted anilines and phenols to algae Pseudokirchneriella subcapitata and bacteria Vibrio fischeri: comparison with published data and QSARs. , 2011, Chemosphere.

[22]  M. C. Tomei,et al.  2,4-Dichlorophenol removal in a solid-liquid two phase partitioning bioreactor (TPPB): kinetics of absorption, desorption and biodegradation. , 2012, New biotechnology.

[23]  Wei Zhou,et al.  High-Dimensional Descriptor Selection and Computational QSAR Modeling for Antitumor Activity of ARC-111 Analogues Based on Support Vector Regression (SVR) , 2012, International journal of molecular sciences.

[24]  Tomasz Bączek,et al.  Exploiting non-linear relationships between retention time and molecular structure of peptides originating from proteomes and comparing three multivariate approaches. , 2016, Journal of pharmaceutical and biomedical analysis.

[25]  Quantitative Structure–Activity Relationship for Prediction of the Toxicity of Phenols on Photobacterium phosphoreum , 2012, Bulletin of Environmental Contamination and Toxicology.

[26]  Zhou Wei,et al.  A Novel QSAR Model Based on Geostatistics and Support Vector Regression , 2009 .

[27]  Igor V. Tetko,et al.  Virtual Computational Chemistry Laboratory – Design and Description , 2005, J. Comput. Aided Mol. Des..

[28]  Jahan B. Ghasemi,et al.  Multivariate statistical analysis methods in QSAR , 2015 .

[29]  Jon W. Ball,et al.  Quantitative structure‐activity relationships for toxicity of phenols using regression analysis and computational neural networks , 1994 .

[30]  T. J. Villalobos,et al.  2D, 3D-QSAR and molecular docking of 4(1H)-quinolones analogues with antimalarial activities. , 2013 .

[31]  E. Zaki,et al.  Molecular Characterization of Phenol-Degrading Bacteria Isolated from Different Egyptian Ecosystems , 2002, Microbial Ecology.

[32]  Maykel Cruz-Monteagudo,et al.  Computational modeling tools for the design of potent antimalarial bisbenzamidines: overcoming the antimalarial potential of pentamidine. , 2007, Bioorganic & medicinal chemistry.

[33]  F. Gharagheizi,et al.  Prediction of molecular diffusivity of pure components into air: a QSPR approach. , 2008, Chemosphere.