In silico Prediction of Chemical Ames Mutagenicity

Mutagenicity is one of the most important end points of toxicity. Due to high cost and laboriousness in experimental tests, it is necessary to develop robust in silico methods to predict chemical mutagenicity. In this paper, a comprehensive database containing 7617 diverse compounds, including 4252 mutagens and 3365 nonmutagens, was constructed. On the basis of this data set, high predictive models were then built using five machine learning methods, namely support vector machine (SVM), C4.5 decision tree (C4.5 DT), artificial neural network (ANN), k-nearest neighbors (kNN), and naïve Bayes (NB), along with five fingerprints, namely CDK fingerprint (FP), Estate fingerprint (Estate), MACCS keys (MACCS), PubChem fingerprint (PubChem), and Substructure fingerprint (SubFP). Performances were measured by cross validation and an external test set containing 831 diverse chemicals. Information gain and substructure analysis were used to interpret the models. The accuracies of fivefold cross validation were from 0.808 to 0.841 for top five models. The range of accuracy for the external validation set was from 0.904 to 0.980, which outperformed that of Toxtree. Three models (PubChem-kNN, MACCS-kNN, and PubChem-SVM) showed high and reliable predictive accuracy for the mutagens and nonmutagens and, hence, could be used in prediction of chemical Ames mutagenicity.

[1]  B. Ames,et al.  Methods for detecting carcinogens and mutagens with the Salmonella/mammalian-microsome mutagenicity test. , 1975, Mutation research.

[2]  A. Giuliani,et al.  Computer-assisted analysis of interlaboratory Ames test variability. , 1988, Journal of toxicology and environmental health.

[3]  B. E. Evans,et al.  Methods for drug discovery: development of potent, selective, orally effective cholecystokinin antagonists. , 1988, Journal of medicinal chemistry.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[6]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[7]  E. Zeiger,et al.  The Ames Salmonella/microsome mutagenicity assay. , 2000, Mutation research.

[8]  I A Basheer,et al.  Artificial neural networks: fundamentals, computing, design, and application. , 2000, Journal of microbiological methods.

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  L. Hall,et al.  Three new consensus QSAR models for the prediction of Ames genotoxicity. , 2004, Mutagenesis.

[11]  A. Tropsha,et al.  kappa Nearest neighbors QSAR modeling as a variational problem: theory and applications. , 2005, Journal of chemical information and modeling.

[12]  Alexander Tropsha,et al.  k Nearest Neighbors QSAR Modeling as a Variational Problem: Theory and Applications , 2005, J. Chem. Inf. Model..

[13]  R. Benigni Structure-activity relationship studies of chemical mutagens and carcinogens: mechanistic investigations and prediction approaches. , 2005, Chemical reviews.

[14]  J. Kazius,et al.  Derivation and validation of toxicophores for mutagenicity prediction. , 2005, Journal of medicinal chemistry.

[15]  Hongmao Sun A naive bayes classifier for prediction of multidrug resistance reversal activity on the basis of atom typing. , 2005, Journal of medicinal chemistry.

[16]  Miklos Feher,et al.  Novel 2D Fingerprints for Ligand-Based Virtual Screening , 2006, J. Chem. Inf. Model..

[17]  Dariusz Plewczynski,et al.  Assessing Different Classification Methods for Virtual Screening , 2006, J. Chem. Inf. Model..

[18]  Xiaomin Luo,et al.  Mutagenic probability estimation of chemical compounds by a novel molecular electrophilicity vector and support vector machine , 2006, Bioinform..

[19]  Naomi L Kruhlak,et al.  Progress in QSAR toxicity screening of pharmaceutical impurities and other FDA regulated products. , 2007, Advanced drug delivery reviews.

[20]  J. Kramer,et al.  The application of discovery toxicology and pathology towards the design of safer pharmaceutical lead candidates , 2007, Nature Reviews Drug Discovery.

[21]  Berith F. Jensen,et al.  In silico prediction of cytochrome P450 2D6 and 3A4 inhibition using Gaussian kernel weighted k-nearest neighbor and extended connectivity fingerprints, including structural fragment analysis of inhibitors versus noninhibitors. , 2007, Journal of medicinal chemistry.

[22]  Romualdo Benigni,et al.  Structure alerts for carcinogenicity, and the Salmonella assay system: a novel insight through the chemical relational databases technology. , 2008, Mutation research.

[23]  Rajarshi Guha Flexible Web Service Infrastructure for the Development and Deployment of Predictive Models , 2008, J. Chem. Inf. Model..

[24]  M. Pavan,et al.  Publicly-accessible QSAR software tools developed by the Joint Research Centre , 2008, SAR and QSAR in environmental research.

[25]  Eugen Lounkine,et al.  Relevance of Feature Combinations for Similarity Searching Using General or Activity Class-Directed Molecular Fingerprints , 2009, J. Chem. Inf. Model..

[26]  Klaus-Robert Müller,et al.  Benchmark Data Set for in Silico Prediction of Ames Mutagenicity , 2009, J. Chem. Inf. Model..

[27]  Alexander Tropsha,et al.  Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research , 2010, J. Chem. Inf. Model..

[28]  I. Tetko,et al.  Applicability domain for in silico models to achieve accuracy of experimental measurements , 2010 .

[29]  Jie Shen,et al.  Estimation of ADME Properties with Substructure Pattern Recognition , 2010, J. Chem. Inf. Model..

[30]  T. Ferrari,et al.  An open source multistep model to predict mutagenicity from statistical analysis and relevant structural alerts , 2010, Chemistry Central journal.

[31]  T. Singer,et al.  Comparative evaluation of in silico systems for ames test mutagenicity prediction: scope and limitations. , 2011, Chemical research in toxicology.

[32]  Lei Yang,et al.  Classification of Cytochrome P450 Inhibitors and Noninhibitors Using Combined Classifiers , 2011, J. Chem. Inf. Model..

[33]  Yue Yu,et al.  In silico prediction of Tetrahymena pyriformis toxicity for diverse industrial chemicals with substructure pattern recognition and machine learning methods. , 2011, Chemosphere.

[34]  F. Cheng,et al.  Insights into Molecular Basis of Cytochrome P450 Inhibitory Promiscuity of Compounds , 2011, J. Chem. Inf. Model..

[35]  Lei Chen,et al.  ADME evaluation in drug discovery. 10. Predictions of P-glycoprotein inhibitors using recursive partitioning and naive Bayesian classification techniques. , 2011, Molecular pharmaceutics.

[36]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[37]  M. Natália D. S. Cordeiro,et al.  Two New Parameters Based on Distances in a Receiver Operating Characteristic Chart for the Selection of Classification Models , 2011, J. Chem. Inf. Model..

[38]  Bin Chen,et al.  Comparison of Random Forest and Pipeline Pilot Naïve Bayes in Prospective QSAR Predictions , 2012, J. Chem. Inf. Model..

[39]  Youyong Li,et al.  ADMET evaluation in drug discovery. 12. Development of binary classification models for prediction of hERG potassium channel blockage. , 2012, Molecular pharmaceutics.

[40]  Jie Shen,et al.  In Silico Assessment of Chemical Biodegradability , 2012, J. Chem. Inf. Model..

[41]  Yadi Zhou,et al.  Prediction of chemical-protein interactions: multitarget-QSAR versus computational chemogenomic methods. , 2012, Molecular bioSystems.