论文信息 - Prediction of genotoxicity of chemical compounds by statistical learning methods.

Prediction of genotoxicity of chemical compounds by statistical learning methods.

Various toxicological profiles, such as genotoxic potential, need to be studied in drug discovery processes and submitted to the drug regulatory authorities for drug safety evaluation. As part of the effort for developing low cost and efficient adverse drug reaction testing tools, several statistical learning methods have been used for developing genotoxicity prediction systems with an accuracy of up to 73.8% for genotoxic (GT+) and 92.8% for nongenotoxic (GT-) agents. These systems have been developed and tested by using less than 400 known GT+ and GT- agents, which is significantly less in number and diversity than the 860 GT+ and GT- agents known at present. There is a need to examine if a similar level of accuracy can be achieved for the more diverse set of molecules and to evaluate other statistical learning methods not yet applied to genotoxicity prediction. This work is intended for testing several statistical learning methods by using 860 GT+ and GT- agents, which include support vector machines (SVM), probabilistic neural network (PNN), k-nearest neighbor (k-NN), and C4.5 decision tree (DT). A feature selection method, recursive feature elimination, is used for selecting molecular descriptors relevant to genotoxicity study. The overall accuracies of SVM, k-NN, and PNN are comparable to and those of DT lower than the results from earlier studies, with SVM giving the highest accuracies of 77.8% for GT+ and 92.7% for GT- agents. Our study suggests that statistical learning methods, particularly SVM, k-NN, and PNN, are useful for facilitating the prediction of genotoxic potential of a diverse set of molecules.

[1] J W Green,et al. A review of the genotoxicity of marketed pharmaceuticals. , 2001, Mutation research.

[2] Ekaterina Gordeeva,et al. Traditional topological indexes vs electronic, geometrical, and combined molecular descriptors in QSAR/QSPR research , 1993, J. Chem. Inf. Comput. Sci..

[3] Roberto Todeschini,et al. Handbook of Molecular Descriptors , 2002 .

[4] Thomas Hofmann,et al. Predicting CNS Permeability of Drug Molecules: Comparison of Neural Network and Support Vector Machine Algorithms , 2002, J. Comput. Biol..

[5] Subhash C. Basak,et al. Prediction of Complement-Inhibitory Activity of Benzamidines Using Topological and Geometric Parameters , 1999, J. Chem. Inf. Comput. Sci..

[6] Donald F. Specht,et al. Probabilistic neural networks , 1990, Neural Networks.

[7] Svetlana Vasilieva,et al. SOS Chromotest methodology for fundamental genetic research. , 2002, Research in microbiology.

[8] Nigel Greene,et al. Computer systems for the prediction of toxicity: an update. , 2002, Advanced drug delivery reviews.

[9] Bernard F. Buxton,et al. Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[10] Peter C Jurs,et al. Predicting the genotoxicity of thiophene derivatives from molecular structure. , 2003, Chemical research in toxicology.

[11] G. Cash,et al. Prediction of the genotoxicity of aromatic and heteroaromatic amines using electrotopological state indices. , 2001, Mutation research.

[12] Sean B. Holden,et al. Support Vector Machines for ADME Property Classification , 2003 .

[13] A. Bolzán,et al. Genotoxicity of streptozotocin. , 2002, Mutation research.

[14] B. Matthews. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[15] J E Roulston,et al. Screening with tumor markers , 2002, Molecular biotechnology.

[16] Y Xue,et al. Prediction of torsade-causing potential of drugs by support vector machine approach. , 2004, Toxicological sciences : an official journal of the Society of Toxicology.

[17] Stephen K. Durham,et al. Predicting the Genotoxicity of Secondary and Aromatic Amines Using Data Subsetting To Generate a Model Ensemble , 2003, J. Chem. Inf. Comput. Sci..

[18] Charles E. Heckler,et al. Applied Multivariate Statistical Analysis , 2005, Technometrics.

[19] Andreas Zell,et al. Feature Selection for Descriptor Based Classification Models. 2. Human Intestinal Absorption (HIA) , 2004, J. Chem. Inf. Model..

[20] Pierre Baldi,et al. Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[21] J. L. Hodges,et al. Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[22] Denis M. Bayada,et al. Molecular Diversity and Representativity in Chemical Databases , 1999, J. Chem. Inf. Comput. Sci..

[23] L. S. Davis,et al. An assessment of support vector machines for land cover classi(cid:142) cation , 2002 .

[24] Nello Cristianini,et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[25] David A. Gough,et al. Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[26] M Pastor,et al. VolSurf: a new tool for the pharmacokinetic optimization of lead compounds. , 2000, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[27] T. Cacoullos. Estimation of a multivariate density , 1966 .

[28] E. Parzen. On Estimation of a Probability Density Function and Mode , 1962 .

[29] H. Yu,et al. Discovering compact and highly discriminative features or combinations of drug activities using support vector machines , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[30] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[31] B Testa,et al. Predicting blood-brain barrier permeation from three-dimensional molecular structure. , 2000, Journal of medicinal chemistry.

[32] C A Marchant,et al. Prediction of rodent carcinogenicity using the DEREK system for 30 chemicals currently being tested by the National Toxicology Program. The DEREK Collaborative Group. , 1996, Environmental health perspectives.

[33] J. Ashby. Fundamental structural alerts to potential carcinogenicity or noncarcinogenicity. , 1985, Environmental mutagenesis.

[34] Sayan Mukherjee,et al. Feature Selection for SVMs , 2000, NIPS.

[35] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[36] Eamonn F. Healy,et al. Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model , 1985 .

[37] Cesare Furlanello,et al. An accelerated procedure for recursive feature ranking on microarray data , 2003, Neural Networks.

[38] W. P. Purcell,et al. Review of mutagenicity of monocyclic aromatic amines: quantitative structure-activity relationships. , 1997, Mutation research.