Using fragment chemistry data mining and probabilistic neural networks in screening chemicals for acute toxicity to the fathead minnow

The paper is illustrating how the general data mining methodology may be adapted to provide solutions to the problem of high throughput virtual screening of organic chemicals for possible acute toxicity to the fathead minnow fish. The present approach involves mining fragment information from chemical structures and is using probabilistic neural networks to model the relationship between structure and toxicity. Probabilistic neural networks implement a special class of multivariate non-linear Bayesian statistical models. The mathematical principles supporting their use for value prediction purposes are clarified and their peculiarities discussed. As part of the research phase of the data mining process, a dataset consisting of 800 structures and associated fathead minnow (Pimephales promelas) 96-h LC50 acute toxicity endpoint information is used for both the purpose of identifying an advantageous combination of fragment descriptors and for training the neural networks. As a result, two powerful models are generated. Model 1 implements the basic PNN with Gaussian kernel (statistical corrections included) while Model 2 implements the PNN with Gaussian kernel and separated variables. External validation is performed using a separate dataset consisting of 86 structures and associated toxicity information. Both learning and generalization capabilities of the two models are investigated and their limitations discussed.

[1]  T W Schultz,et al.  Modeling the Toxicity of Chemicals to Tetrahymena pyriformis Using Molecular Fragment Descriptors and Probabilistic Neural Networks , 2000, Archives of environmental contamination and toxicology.

[2]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[3]  T. Cacoullos Estimation of a multivariate density , 1966 .

[4]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[5]  Timothy Masters,et al.  Practical neural network recipes in C , 1993 .

[6]  G. Klopman Artificial intelligence approach to structure-activity studies. Computer automated structure evaluation of biological activity of organic molecules , 1985 .

[7]  Ş. Niculescu Artificial neural networks and genetic algorithms in QSAR , 2003 .

[8]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[9]  Roger L Breton,et al.  A comparison of model performance for six quantitative structure‐activity relationship packages that predict acute toxicity to fish , 2003, Environmental toxicology and chemistry.

[10]  Klaus L.E. Kaiser,et al.  Influence of Data Preprocessing and Kernel Selection on Probabilistic Neural Network Modeling of the Acute Toxicity of Chemicals to the Fathead Minnow and Vibrio fischeri Bacteria , 1998 .

[11]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[12]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[13]  William S. Meisel,et al.  Computer-oriented approaches to pattern recognition , 1972 .

[14]  Ş. Niculescu,et al.  Using probabilistic neural networks to model the toxicity of chemicals to the fathead minnow (Pimephales promelas): a study based on 865 compounds. , 1999, Chemosphere.