Lazy structure-activity relationships (lazar) for the prediction of rodent carcinogenicity and Salmonella mutagenicity

Abstractlazar is a new tool for the prediction of toxic properties of chemical structures. It derives predictions for query structures from a database with experimentally determined toxicity data. lazar generates predictions by searching the database for compounds that are similar with respect to a given toxic activity and calculating the prediction from their activities. Apart form the prediction, lazar provides the rationales (structural features and similar compounds) for the prediction and a reliable condence index that indicates, if a query structure falls within the applicability domain of the training database.Leave-one-out (LOO) crossvalidation experiments were carried out for 10 carcinogenicity endpoints ({female|male} {hamster|mouse|rat} carcinogenicity and aggregate endpoints {hamster|mouse|rat} carcinogenicity and rodent carcinogenicity) and Salmonella mutagenicity from the Carcinogenic Potency Database (CPDB). An external validation of Salmonella mutagenicity predictions was performed with a dataset of 3895 structures. Leave-one-out and external validation experiments indicate that Salmonella mutagenicity can be predicted with 85% accuracy for compounds within the applicability domain of the CPDB. The LOO accuracy of lazar predictions of rodent carcinogenicity is 86%, the accuracies for other carcinogenicity endpoints vary between 78 and 95% for structures within the applicability domain.

[1]  C Helma Data Mining and Knowledge Discovery in Predictive Toxicology , 2004, SAR and QSAR in environmental research.

[2]  Ashwin Srinivasan,et al.  Statistical Evaluation of the Predictive Toxicology Challenge 2000-2001 , 2003, Bioinform..

[3]  J. Kazius,et al.  Derivation and validation of toxicophores for mutagenicity prediction. , 2005, Journal of medicinal chemistry.

[4]  Christoph Helma lazar: Lazy Structure–Activity Relationships for Toxicity Prediction , 2005 .

[5]  R. Benigni Structure-activity relationship studies of chemical mutagens and carcinogens: mechanistic investigations and prediction approaches. , 2005, Chemical reviews.

[6]  Christoph Helma,et al.  In silico predictive toxicology: the state-of-the-art and strategies to predict human health effects. , 2005, Current opinion in drug discovery & development.

[7]  Hannu Toivonen,et al.  Statistical evaluation of the predictive toxicology challenge , 2000 .

[8]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..

[9]  Stefan Kramer,et al.  Machine Learning and Data Mining , 2005 .

[10]  R Benigni,et al.  QSAR prediction of rodent carcinogenicity for a set of chemicals currently bioassayed by the US National Toxicology Program. , 1991, Mutagenesis.

[11]  Erik Johansson,et al.  Regression- and Projection-Based Approaches in Predictive Toxicology , 2005 .

[12]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[13]  Tom M. Mitchell,et al.  Machine Learning and Data Mining , 2012 .

[14]  Alessandro Giuliani,et al.  Putting the Predictive Toxicology Challenge Into Perspective: Reflections on the Results , 2003, Bioinform..

[15]  Romualdo Benigni,et al.  Quantitative Structure-Activity Relationship (QSAR) Models of Mutagens and Carcinogens , 2003 .

[16]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[17]  R Benigni Predicting chemical carcinogenesis in rodents: the state of the art in light of a comparative exercise. , 1995, Mutation research.

[18]  Romualdo Benigni,et al.  The second National Toxicology Program comparative exercise on the prediction of rodent carcinogenicity: definitive results. , 2004, Mutation research.

[19]  Peter McBurney,et al.  The use of expert systems for toxicology risk prediction , 2005 .

[20]  Luc De Raedt,et al.  A perspective on inductive databases , 2002, SKDD.

[21]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[22]  Heikki Mannila,et al.  A database perspective on knowledge discovery , 1996, CACM.

[23]  C Helma,et al.  Data quality in predictive toxicology: reproducibility of rodent carcinogenicity experiments. , 2001, Environmental health perspectives.

[24]  Henry S Rzepa,et al.  Enhancement of the chemical semantic web through the use of InChI identifiers. , 2005, Organic & biomolecular chemistry.

[25]  A Varnek,et al.  "In silico" design of potential anti-HIV actives using fragment descriptors. , 2005, Combinatorial chemistry & high throughput screening.