A comparative study of support vector machine, artificial neural network and Bayesian classifier for mutagenicity prediction

Mutagenicity is the capability of a chemical to carry out mutations in genetic material of an organism. In order to curtail expensive drug failures due to mutagenicity found in late development or even in clinical trials, it is crucial to determine potential mutagenicity problems as early as possible. In this work we have proposed three different classifiers, i.e. Support Vector Machine (SVM), Artificial Neural Network (ANN) and Bayesian classifiers, for the prediction of mutagenicity of compounds based on seventeen descriptors. Among the three classifiers Radial Basis Function (RBF) kernel based SVM classifier appeared to be more accurate for classifying the compounds under study on mutagens and non-mutagens. The overall prediction accuracy of SVM model was found to be 71.73% which was appreciably higher than the accuracy of ANN based classifier (59.72%) and Bayesian classifier (66.61%). It suggests that SVM based prediction model can be used for predicting mutagenicity more accurately compared to ANN and Bayesian classifier for data under consideration.

[1]  Marcel Rijckaert,et al.  Genetic algorithm driven clustering for toxicity prediction , 2000, KES'2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516).

[2]  Neal F. Cariello,et al.  Comparison of the computer programs DEREK and TOPKAT to predict bacterial mutagenicity. Deductive Estimate of Risk from Existing Knowledge. Toxicity Prediction by Komputer Assisted Technology. , 2002, Mutagenesis.

[3]  Manuela Pavan,et al.  DRAGON SOFTWARE: AN EASY APPROACH TO MOLECULAR DESCRIPTOR CALCULATIONS , 2006 .

[4]  D. Sanderson,et al.  Computer Prediction of Possible Toxic Action from Chemical Structure; The DEREK System , 1991, Human & experimental toxicology.

[5]  Yuanyuan Wang,et al.  Predictive Toxicology: Benchmarking Molecular Descriptors and Statistical Methods , 2003, J. Chem. Inf. Comput. Sci..

[6]  X. Y. Zhang,et al.  Application of support vector machine (SVM) for prediction toxic activity of different data sets. , 2006, Toxicology.

[7]  David Correa Martins,et al.  SFFS-MR: A Floating Search Strategy for GRNs Inference , 2010, PRIB.

[8]  D. Ritchie,et al.  The new pre-preclinical paradigm: compound optimization in early and late phase drug discovery. , 2001, Current topics in medicinal chemistry.

[9]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  H S Rosenkranz,et al.  Testing by artificial intelligence: computational alternatives to the determination of mutagenicity. , 1992, Mutation research.

[12]  G M Pearl,et al.  Integration of computational analysis as a sentinel tool in toxicological assessments. , 2001, Current topics in medicinal chemistry.

[13]  S. Flora,et al.  A model based on molecular structure descriptors for predicting mutagenicity of organic compounds , 1985 .

[14]  Herbert S. Rosenkranz,et al.  Multiple Computer‐Automated structure evaluation program study of aquatic toxicity 1: Guppy , 1999 .

[15]  Zyad Shaaban,et al.  Data Mining: A Preprocessing Engine , 2006 .

[16]  Egon L. Willighagen,et al.  The Blue Obelisk—Interoperability in Chemical Informatics , 2006, J. Chem. Inf. Model..

[17]  Luc De Raedt,et al.  Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds , 2004, J. Chem. Inf. Model..

[18]  J. Kazius,et al.  Derivation and validation of toxicophores for mutagenicity prediction. , 2005, Journal of medicinal chemistry.

[19]  Douglas M. Hawkins,et al.  Predicting Mutagenicity of Congeneric and Diverse Sets of Chemicals Using Computed Molecular Descriptors: A Hierarchical Approach , 2003 .

[20]  M. Abraham,et al.  Toxicity of organic chemicals to Tetrahymena pyriformis: effect of polarity and ionization on toxicity. , 2010, Chemosphere.

[21]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[22]  J. Ashby,et al.  Prediction of Salmonella mutagenicity. , 1996, Mutagenesis.

[23]  G. P. Ford,et al.  The influence of molecular size and partition coefficients on the predictability of tumor initiation in mouse skin from mutagenicity in Salmonella typhimurium. , 1980, Carcinogenesis.

[24]  Romualdo Benigni,et al.  Quantitative Structure-Activity Relationship (QSAR) Models of Mutagens and Carcinogens , 2003 .

[25]  Sean Ekins Computational toxicology : risk assessment for pharmaceutical and environmental chemicals , 2007 .

[26]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[27]  R. Tennant,et al.  Definitive relationships among chemical structure, carcinogenicity and mutagenicity for 301 chemicals tested by the U.S. NTP. , 1991, Mutation research.

[28]  R. Parthasarathi,et al.  Electrophilicity as a possible descriptor for toxicity prediction. , 2005, Bioorganic & medicinal chemistry.

[29]  Giuseppina C. Gini,et al.  Neuro-Fuzzy Knowledge Representation for Toxicity Prediction of Organic Compounds , 2002, ECAI.

[30]  Xiaomin Luo,et al.  Mutagenic probability estimation of chemical compounds by a novel molecular electrophilicity vector and support vector machine , 2006, Bioinform..

[31]  Nigel Greene,et al.  Computer systems for the prediction of toxicity: an update. , 2002, Advanced drug delivery reviews.

[32]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[33]  R. Snyder,et al.  Assessment of the sensitivity of the computational programs DEREK, TOPKAT, and MCASE in the prediction of the genotoxicity of pharmaceutical molecules , 2004, Environmental and molecular mutagenesis.

[34]  Doo-Il Kim,et al.  The Quantitative Structure-Mutagenicity Relationship of Polycylic Aromatic Hydrocarbon Metabolites , 2006 .

[35]  D. Afzali,et al.  Prediction of Acute in vivo Toxicity of Some Amine and Amide Drugs to Rats by Multiple Linear Regression, Partial Least Squares and an Artificial Neural Network , 2007, Analytical sciences : the international journal of the Japan Society for Analytical Chemistry.

[36]  G Patlewicz,et al.  Toxmatch–a new software tool to aid in the development and evaluation of chemically similar groups , 2008, SAR and QSAR in environmental research.

[37]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[38]  Alan G. E. Wilson,et al.  A multiple in silico program approach for the prediction of mutagenicity from chemical structure. , 2003, Mutation research.