An open source multistep model to predict mutagenicity from statistical analysis and relevant structural alerts

BackgroundMutagenicity is the capability of a substance to cause genetic mutations. This property is of high public concern because it has a close relationship with carcinogenicity and potentially with reproductive toxicity. Experimentally, mutagenicity can be assessed by the Ames test on Salmonella with an estimated experimental reproducibility of 85%; this intrinsic limitation of the in vitro test, along with the need for faster and cheaper alternatives, opens the road to other types of assessment methods, such as in silico structure-activity prediction models.A widely used method checks for the presence of known structural alerts for mutagenicity. However the presence of such alerts alone is not a definitive method to prove the mutagenicity of a compound towards Salmonella, since other parts of the molecule can influence and potentially change the classification. Hence statistically based methods will be proposed, with the final objective to obtain a cascade of modeling steps with custom-made properties, such as the reduction of false negatives.ResultsA cascade model has been developed and validated on a large public set of molecular structures and their associated Salmonella mutagenicity outcome. The first step consists in the derivation of a statistical model and mutagenicity prediction, followed by further checks for specific structural alerts in the "safe" subset of the prediction outcome space. In terms of accuracy (i.e., overall correct predictions of both negative and positives), the obtained model approached the 85% reproducibility of the experimental mutagenicity Ames test.ConclusionsThe model and the documentation for regulatory purposes are freely available on the CAESAR website. The input is simply a file of molecular structures and the output is the classification result.

[1]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[2]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[3]  E C Miller,et al.  Searches for ultimate chemical carcinogens and their reactions with cellular macromolecules , 1981, Cancer.

[4]  B. Ames,et al.  Charles S. Mott prize. The detection of environmental mutagens and potential carcinogens , 1984, Cancer.

[5]  J. Ashby Fundamental structural alerts to potential carcinogenicity or noncarcinogenicity. , 1985, Environmental mutagenesis.

[6]  R. Tennant,et al.  Chemical structure, Salmonella mutagenicity and extent of carcinogenicity as indicators of genotoxic carcinogenesis among 222 chemicals tested in rodents by the U.S. NCI/NTP. , 1988, Mutation research.

[7]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[8]  Errol Zeiger,et al.  Measuring Intra-Assay Agreement for the Ames Salmonella Assay , 1991 .

[9]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[10]  A. Ghose,et al.  Prediction of Hydrophobic (Lipophilic) Properties of Small Organic Molecules Using Fragmental Methods: An Analysis of ALOGP and CLOGP Methods , 1998 .

[11]  L. Hall,et al.  Molecular Structure Description: The Electrotopological State , 1999 .

[12]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[13]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[14]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  A. Bailey,et al.  The use of structure-activity relationship analysis in the food contact notification program. , 2005, Regulatory toxicology and pharmacology : RTP.

[17]  J. Kazius,et al.  Derivation and validation of toxicophores for mutagenicity prediction. , 2005, Journal of medicinal chemistry.

[18]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[19]  Emilio Benfenati,et al.  The Expanding Role of Predictive Toxicology: An Update on the (Q)SAR Models for Mutagens and Carcinogens , 2007, Journal of environmental science and health. Part C, Environmental carcinogenesis & ecotoxicology reviews.

[20]  Jianhua Yao,et al.  Prediction of mutagenic toxicity by combination of Recursive Partitioning and Support Vector Machines , 2007, Molecular Diversity.

[21]  Giuseppina C. Gini,et al.  E-Modelling: Foundations and Cases for Applying AI to Life Sciences , 2007, Int. J. Artif. Intell. Tools.

[22]  K. K. Hii Chemistry Central Journal , 2007 .

[23]  Romualdo Benigni,et al.  The Benigni / Bossa rulebase for mutagenicity and carcinogenicity - a module of Toxtree , 2008 .

[24]  The five QSAR models for REACH developed within CAESAR , 2009 .

[25]  Emilio Benfenati,et al.  Support vector machines in the prediction of mutagenicity of chemical compounds , 2009, NAFIPS 2009 - 2009 Annual Meeting of the North American Fuzzy Information Processing Society.

[26]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.