Towards Global QSAR Model Building for Acute Toxicity: Munro Database Case Study

A series of 436 Munro database chemicals were studied with respect to their corresponding experimental LD50 values to investigate the possibility of establishing a global QSAR model for acute toxicity. Dragon molecular descriptors were used for the QSAR model development and genetic algorithms were used to select descriptors better correlated with toxicity data. Toxic values were discretized in a qualitative class on the basis of the Globally Harmonized Scheme: the 436 chemicals were divided into 3 classes based on their experimental LD50 values: highly toxic, intermediate toxic and low to non-toxic. The k-nearest neighbor (k-NN) classification method was calibrated on 25 molecular descriptors and gave a non-error rate (NER) equal to 0.66 and 0.57 for internal and external prediction sets, respectively. Even if the classification performances are not optimal, the subsequent analysis of the selected descriptors and their relationship with toxicity levels constitute a step towards the development of a global QSAR model for acute toxicity.

[1]  John C. Young,et al.  Multivariate Tools: Principal Component Analysis , 2005 .

[2]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[3]  R. Todeschini,et al.  Multivariate Classification for Qualitative Analysis , 2009 .

[4]  R. Didziapetris,et al.  Estimation of reliability of predictions and model applicability domain evaluation in the analysis of acute toxicity (LD 50) , 2010, SAR and QSAR in environmental research.

[5]  M Verwei,et al.  Development of a QSAR for worst case estimates of acute toxicity of chemically reactive compounds. , 2007, Toxicology letters.

[6]  W. Russell,et al.  Ethical and Scientific Considerations Regarding Animal Testing and Research , 2011, PloS one.

[7]  Johan Trygg,et al.  O2‐PLS, a two‐block (X–Y) latent variable regression (LVR) method with an integral OSC filter , 2003 .

[8]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .

[9]  Lior Rokach,et al.  Pattern Classification Using Ensemble Methods , 2009, Series in Machine Perception and Artificial Intelligence.

[10]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .

[11]  C. B. Lucasius,et al.  Understanding and using genetic algorithms Part 1. Concepts, properties and context , 1993 .

[12]  Rosa García Couto Globally Harmonized System of Classification and Labelling of Chemicals (GHS) , 2009 .

[13]  R A Ford,et al.  Correlation of structural class with no-observed-effect levels: a proposal for establishing a threshold of concern. , 1996, Food and chemical toxicology : an international journal published for the British Industrial Biological Research Association.

[14]  R A Ford,et al.  Estimation of toxic hazard--a decision tree approach. , 1978, Food and cosmetics toxicology.

[15]  D. Ballabio,et al.  Classification tools in chemistry. Part 1: linear models. PLS-DA , 2013 .

[16]  山上 鋭享 Globally Harmonized System of Classification and Labelling of Chemicals (GHS) 化学品の分類および表示に関する世界調和システム , 2007 .

[17]  Roberto Todeschini,et al.  Molecular descriptors for chemoinformatics , 2009 .

[18]  Manuela Pavan,et al.  Applicability of physicochemical data, QSARs and read-across in Threshold of Toxicological Concern assessment , 2011 .

[19]  B. Kowalski,et al.  K-Nearest Neighbor Classification Rule (pattern recognition) applied to nuclear magnetic resonance spectral interpretation , 1972 .

[20]  R. Leardi,et al.  Genetic algorithms applied to feature selection in PLS regression: how and when to use them , 1998 .

[21]  R E Lenga,et al.  The Sigma-Aldrich library of chemical safety data. Ed. 2. , 1988 .

[22]  R. Todeschini k-nearest neighbour method: The influence of data transformations and metrics , 1989 .