Database mining with adaptive fuzzy partition: Application to the prediction of pesticide toxicity on rats

A data set of 235 pesticide compounds, divided into three classes according to their toxicity toward rats, was analyzed by a fuzzy logic procedure called adaptive fuzzy partition (AFP). This method allows the establishment of molecular descriptor/chemical activity relationships by dynamically dividing the descriptor space into a set of fuzzily partitioned subspaces. A set of 153 molecular descriptors was analyzed, including topological, physicochemical, quantum mechanical, constitutional, and electronic parameters, and the most relevant descriptors were selected with the help of a procedure combining genetic algorithm concepts and a stepwise method. The ability of this AFP model to classify the three toxicity classes was validated after dividing the data set compounds into training and test sets, including 165 and 70 molecules, respectively. The experimental class was correctly predicted for 76% of the test-set compounds. Furthermore, the most toxic class, particularly important for real applications of the toxicity models, was correctly predicted in 86% of cases. Finally, a comparison between the results obtained by AFP and those obtained by other classic classification techniques showed that AFP improved the predictive power of the proposed models.

[1]  F Ros,et al.  Database mining applied to central nervous system (CNS) activity. , 2001, European journal of medicinal chemistry.

[2]  Yinghua Lin,et al.  Building a Fuzzy System from Input-Output Data , 1994, J. Intell. Fuzzy Syst..

[3]  M. Pintore,et al.  Molecular descriptor selection combining genetic algorithms and fuzzy logic: application to database mining procedures , 2002 .

[4]  Paul J. Gemperline,et al.  Nonlinear multivariate calibration using principal components regression and artificial neural networks , 1991 .

[5]  J Devillers A General QSAR Model for Predicting the Acute Toxicity of Pesticides to Lepomis Macrochirus , 2001, SAR and QSAR in environmental research.

[6]  R. Leardi,et al.  Genetic algorithms applied to feature selection in PLS regression: how and when to use them , 1998 .

[7]  Hisao Ishibuchi,et al.  Selecting fuzzy if-then rules for classification problems using genetic algorithms , 1995, IEEE Trans. Fuzzy Syst..

[8]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[9]  J. Hermens,et al.  Classifying environmental pollutants , 1992 .

[10]  Witold Pedrycz,et al.  A parametric model for fusing heterogeneous fuzzy data , 1996, IEEE Trans. Fuzzy Syst..

[11]  Alan R. Katritzky,et al.  Predictive toxicology of chemicals : experiences and impact of AI tools : papers from the 1999 AAAI Symposium : March 22-24, Stanford, California , 1999 .

[12]  Didier Dubois,et al.  An introduction to possibilistic and fuzzy logics , 1990 .

[13]  M. Gupta,et al.  Theory of T -norms and fuzzy inference methods , 1991 .

[14]  E Benfenati,et al.  Factors Influencing Predictive Models for Toxicology , 2001, SAR and QSAR in environmental research.

[15]  Michio Sugeno,et al.  A fuzzy-logic-based approach to qualitative modeling , 1993, IEEE Trans. Fuzzy Syst..

[16]  F Ros,et al.  Hybrid Systems for Virtual Screening: Interest of Fuzzy Clustering Applied to Olfaction , 2000, SAR and QSAR in environmental research.

[17]  Robert Hecht-Nielsen,et al.  Theory of the backpropagation neural network , 1989, International 1989 Joint Conference on Neural Networks.

[18]  L. Zadeh Fuzzy sets and their application to pattern classification and clustering analysis , 1996 .

[19]  Roberto Todeschini,et al.  Linear discriminant hierarchical clustering: A modeling and cross-validable divisive clustering method , 1993 .

[20]  Marco Pintore,et al.  Prediction of odours of aliphatic alcohols and carbonylated compounds using fuzzy partition and self organising maps (SOM) , 2000 .

[21]  R L Lipnick,et al.  A QSAR study of the toxicity of amines to the fathead minnow. , 1991, The Science of the total environment.

[22]  F. Ros,et al.  Building and preprocessing of image data using indices of representativeness and classification applied to granular product characterization , 1997 .

[23]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[24]  C. Russom,et al.  QSAR modelling of the ERL-D fathead minnow acute toxicity database. , 1991, Xenobiotica; the fate of foreign compounds in biological systems.

[25]  Randy L. Haupt,et al.  Practical Genetic Algorithms , 1998 .

[26]  Emilio Benfenati,et al.  Predictive Carcinogenicity: A Model for Aromatic Compounds, with Nitrogen‐Containing Substituents, Based on Molecular Descriptors Using an Artificial Neural Network. , 2000 .

[27]  T W Schultz,et al.  QSARs for monosubstituted anilines eliciting the polar narcosis mechanism of action. , 1991, The Science of the total environment.

[28]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[29]  Willie J.G.M. Peijnenburg,et al.  QSARs for oxidation of phenols in the aqueous environment, suitable for risk assessment , 1996 .

[30]  J. Hermens,et al.  Classifying environmental pollutants. 2: Separation of class 1 (baseline toxicity) and class 2 (‘polar narcosis’) type compounds based on chemical descriptors , 1996 .

[31]  J. K. Kinnear,et al.  Advances in Genetic Programming , 1994 .