Prediction on the mutagenicity of nitroaromatic compounds using quantum chemistry descriptors based QSAR and machine learning derived classification methods.

Nitroaromatic compounds (NACs) are an important type of environmental organic pollutants. However, it is lack of sufficient information relating to their potential adverse effects on human health and the environment due to the limited resources. Thus, using in silico technologies to assess their potential hazardous effects is urgent and promising. In this study, quantitative structure activity relationship (QSAR) and classification models were constructed using a set of NACs based on their mutagenicity against Salmonella typhimurium TA100 strain. For QSAR studies, DRAGON descriptors together with quantum chemistry descriptors were calculated for characterizing the detailed molecular information. Based on genetic algorithm (GA) and multiple linear regression (MLR) analyses, we screened descriptors and developed QSAR models. For classification studies, seven machine learning methods along with six molecular fingerprints were applied to develop qualitative classification models. The goodness of fitting, reliability, robustness and predictive performance of all developed models were measured by rigorous statistical validation criteria, then the best QSAR and classification models were chosen. Moreover, the QSAR models with quantum chemistry descriptors were compared to that without quantum chemistry descriptors and previously reported models. Notably, we also obtained some specific molecular properties or privileged substructures responsible for the high mutagenicity of NACs. Overall, the developed QSAR and classification models can be utilized as potential tools for rapidly predicting the mutagenicity of new or untested NACs for environmental hazard assessment and regulatory purposes, and may provide insights into the in vivo toxicity mechanisms of NACs and related compounds.

[1]  J. Leszczynski,et al.  In vivo toxicity of nitroaromatics: A comprehensive quantitative structure–activity relationship study , 2017, Environmental toxicology and chemistry.

[2]  Frederick P. Roth,et al.  Chemical substructures that enrich for biological activity , 2008, Bioinform..

[3]  I A Basheer,et al.  Artificial neural networks: fundamentals, computing, design, and application. , 2000, Journal of microbiological methods.

[4]  Yang‐Chun Yong,et al.  Recent advances in nitroaromatic pollutants bioreduction by electroactive bacteria , 2018, Process Biochemistry.

[5]  B. Zielińska,et al.  The formation of nitro-PAH from the gas-phase reactions of fluoranthene and pyrene with the OH radical in the presence of NOx , 1986 .

[6]  Hongmao Sun A naive bayes classifier for prediction of multidrug resistance reversal activity on the basis of atom typing. , 2005, Journal of medicinal chemistry.

[7]  E. Papa,et al.  Approaches for externally validated QSAR modelling of Nitrated Polycyclic Aromatic Hydrocarbon mutagenicity , 2007, SAR and QSAR in environmental research.

[8]  N. Zhang,et al.  Identification of the Structural Features of Guanine Derivatives as MGMT Inhibitors Using 3D-QSAR Modeling Combined with Molecular Docking , 2016, Molecules.

[9]  Yanyan Li,et al.  Seasonal variations of NPAHs and OPAHs in PM2.5 at heavily polluted urban and suburban sites in North China: Concentrations, molecular compositions, cancer risk assessments and sources. , 2019, Ecotoxicology and environmental safety.

[10]  Emilio Benfenati,et al.  Simplified Molecular Input Line Entry System‐Based Optimal Descriptors: Quantitative Structure–Activity Relationship Modeling Mutagenicity of Nitrated Polycyclic Aromatic Hydrocarbons , 2009, Chemical biology & drug design.

[11]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[12]  Paul Watson,et al.  Naïve Bayes Classification Using 2D Pharmacophore Feature Triplet Vectors , 2008, J. Chem. Inf. Model..

[13]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[14]  Kurt Straif,et al.  The carcinogenicity of outdoor air pollution. , 2013, The Lancet Oncology.

[15]  Kurt Straif,et al.  Carcinogenicity of diesel-engine and gasoline-engine exhausts and some nitroarenes. , 2012, The Lancet. Oncology.

[16]  Paola Gramatica,et al.  Quantitative structure-activity relationship modeling of polycyclic aromatic hydrocarbon mutagenicity by classification methods based on holistic theoretical molecular descriptors. , 2007, Ecotoxicology and environmental safety.

[17]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[18]  M. T. Saçan,et al.  Impact of geometry optimization methods on QSAR modelling: A case study for predicting human serum albumin binding affinity , 2017, SAR and QSAR in environmental research.

[19]  William A. Telliard,et al.  PRIORITY POLLUTANTS I-A PERSPECTIVES VIEW , 1979 .

[20]  Lu Sun,et al.  Computational models to predict endocrine-disrupting chemical binding with androgen or oestrogen receptors. , 2014, Ecotoxicology and environmental safety.

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  Virapong Prachayasittikul,et al.  Probing the origins of human acetylcholinesterase inhibition via QSAR modeling and molecular docking , 2016, PeerJ.

[23]  Eric J Weber,et al.  In silico environmental chemical science: properties and processes from statistical and computational modelling. , 2017, Environmental science. Processes & impacts.

[24]  O. Deeb,et al.  Predicting the solubility of pesticide compounds in water using QSPR methods , 2010 .

[25]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[26]  Xiao Li,et al.  In Silico Prediction of Chemical Acute Oral Toxicity Using Multi-Classification Methods , 2014, J. Chem. Inf. Model..

[27]  Paola Gramatica,et al.  QSARINS: A new software for the development, analysis, and validation of QSAR MLR models , 2013, J. Comput. Chem..

[28]  E. Benfenati,et al.  Ecotoxicological QSAR modeling of endocrine disruptor chemicals. , 2019, Journal of hazardous materials.

[29]  Feixiong Cheng,et al.  In silico Prediction of Chemical Ames Mutagenicity , 2012, J. Chem. Inf. Model..

[30]  Yongzhen Peng,et al.  In Silico Prediction of O6-Methylguanine-DNA Methyltransferase Inhibitory Potency of Base Analogs with QSAR and Machine Learning Methods , 2018, Molecules.

[31]  Hongbin Yang,et al.  In Silico Prediction of Chemicals Binding to Aromatase with Machine Learning Methods. , 2017, Chemical research in toxicology.

[32]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[33]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[34]  Dariusz Plewczynski,et al.  Assessing Different Classification Methods for Virtual Screening , 2006, J. Chem. Inf. Model..

[35]  T. Marwood,et al.  Escherichia coli lacZ strains engineered for detection of frameshift mutations induced by aromatic amines and nitroaromatic compounds. , 1995, Carcinogenesis.

[36]  S. Tao,et al.  Concentration and photochemistry of PAHs, NPAHs, and OPAHs and toxicity of PM2.5 during the Beijing Olympic Games. , 2011, Environmental science & technology.

[37]  B. E. Evans,et al.  Methods for drug discovery: development of potent, selective, orally effective cholecystokinin antagonists. , 1988, Journal of Medicinal Chemistry.

[38]  L. Trepanier,et al.  Reductive detoxification of arylhydroxylamine carcinogens by human NADH cytochrome b5 reductase and cytochrome b5. , 2006, Chemical research in toxicology.

[39]  S. Agathos,et al.  Biodegradation of nitroaromatic pollutants: from pathways to remediation. , 2000, Biotechnology annual review.

[40]  X. Wu,et al.  QSAR study of the acute toxicity to fathead minnow based on a large dataset , 2016, SAR and QSAR in environmental research.

[41]  P. Kovacic,et al.  Nitroaromatic compounds: Environmental toxicity, carcinogenicity, mutagenicity, therapy and mechanism , 2014, Journal of applied toxicology : JAT.

[42]  Yoshihiro Yamanishi,et al.  Benchmarking a Wide Range of Chemical Descriptors for Drug‐Target Interaction Prediction Using a Chemogenomic Approach , 2014, Molecular informatics.

[43]  M. T. Saçan,et al.  QSAR models for antioxidant activity of new coumarin derivatives$ , 2015, SAR and QSAR in environmental research.

[44]  Feng Luan,et al.  Unified multi-target approach for the rational in silico design of anti-bladder cancer agents. , 2013, Anti-cancer agents in medicinal chemistry.

[45]  M. Cronin,et al.  (Q)SARs to predict environmental toxicities: current status and future needs. , 2017, Environmental science. Processes & impacts.

[46]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[47]  Hongbin Yang,et al.  Insights into pesticide toxicity against aquatic organism: QSTR models on Daphnia Magna. , 2019, Ecotoxicology and environmental safety.

[48]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[49]  Paola Gramatica,et al.  A Historical Excursus on the Statistical Validation Parameters for QSAR Models: A Clarification Concerning Metrics and Terminology , 2016, J. Chem. Inf. Model..

[50]  P. Khadikar,et al.  Mutagenicity of Nitrated Polycyclic Aromatic Hydrocarbons: A QSAR Investigation , 2008, Chemical biology & drug design.

[51]  C. Li,et al.  Development of a model for predicting hydroxyl radical reaction rate constants of organic chemicals at different temperatures. , 2014, Chemosphere.

[52]  W. Chan,et al.  Combination of precolumn nitro-reduction and ultraperformance liquid chromatography with fluorescence detection for the sensitive quantification of 1-nitronaphthalene, 2-nitrofluorene, and 1-nitropyrene in meat products. , 2015, Journal of agricultural and food chemistry.

[53]  Jie Shen,et al.  Estimation of ADME Properties with Substructure Pattern Recognition , 2010, J. Chem. Inf. Model..

[54]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[55]  Paola Gramatica,et al.  External Evaluation of QSAR Models, in Addition to Cross‐Validation: Verification of Predictive Capability on Totally New Chemicals , 2014, Molecular informatics.

[56]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .

[57]  Lingdi Cao,et al.  Using machine learning and quantum chemistry descriptors to predict the toxicity of ionic liquids. , 2018, Journal of hazardous materials.

[58]  G. Mena-Rejón,et al.  2-Amino-4-arylthiazole Derivatives as Anti-giardial Agents: Synthesis, Biological Evaluation and QSAR Studies , 2015 .

[59]  G. Mangiatordi,et al.  Applicability Domain for QSAR models: where theory meets reality , 2016 .

[60]  Guohui Sun,et al.  QSAR and Classification Study on Prediction of Acute Oral Toxicity of N-Nitroso Compounds , 2018, International journal of molecular sciences.

[61]  Paola Gramatica,et al.  QSARINS‐chem: Insubria datasets and new QSAR/QSPR models for environmental pollutants in QSARINS , 2014, J. Comput. Chem..

[62]  Serli Önlü,et al.  Toxicity of contaminants of emerging concern to Dugesia japonica: QSTR modeling and toxicity relationship with Daphnia magna. , 2018, Journal of hazardous materials.

[63]  Roberto Todeschini,et al.  The K correlation index: theory development and its application in chemometrics , 1999 .

[64]  M. Ertürk,et al.  On the aquatic toxicity of substituted phenols to Chlorella vulgaris: QSTR with an extended novel data set and interspecies models. , 2017, Journal of hazardous materials.

[65]  M. T. Saçan,et al.  A multipronged QSAR approach to predict algal low-toxic-effect concentrations of substituted phenols and anilines. , 2018, Journal of hazardous materials.

[66]  K. Roy,et al.  Be aware of error measures. Further studies on validation of predictive QSAR models , 2016 .

[67]  Romualdo Benigni,et al.  Structure alerts for carcinogenicity, and the Salmonella assay system: a novel insight through the chemical relational databases technology. , 2008, Mutation research.

[68]  H. Budzinski,et al.  Polycyclic aromatic hydrocarbons (PAHs), nitrated PAHs and oxygenated PAHs in ambient air of the Marseilles area (South of France): concentrations and sources. , 2007, The Science of the total environment.

[69]  W. Pfannhauser,et al.  Monitoring of nitropolycyclic aromatic hydrocarbons in food using gas chromatography , 1996, Zeitschrift fur Lebensmittel-Untersuchung und -Forschung.

[70]  Shiho Tanaka,et al.  Classification of polycyclic aromatic hydrocarbons based on mutagenicity in lung tissue through DNA microarray , 2013, Environmental toxicology.

[71]  M. Natália D. S. Cordeiro,et al.  Two New Parameters Based on Distances in a Receiver Operating Characteristic Chart for the Selection of Classification Models , 2011, J. Chem. Inf. Model..

[72]  K. Hayakawa Environmental Behaviors and Toxicities of Polycyclic Aromatic Hydrocarbons and Nitropolycyclic Aromatic Hydrocarbons. , 2016, Chemical & pharmaceutical bulletin.

[73]  Paola Gramatica,et al.  Principles of QSAR models validation: internal and external , 2007 .

[74]  Rafael Gozalbes,et al.  Applications of Chemoinformatics in Predictive Toxicology for Regulatory Purposes, Especially in the Context of the EU REACH Legislation , 2018 .

[75]  Strother H. Walker,et al.  Estimation of the probability of an event as a function of several independent variables. , 1967, Biometrika.