Predicting the Androgenicity of Structurally Diverse Compounds from Molecular Structure Using Different Classifiers

Many environmental and industrial chemicals are reported to have androgenic or antiandrogenic activities. These androgenic chemicals may act as hormones and have the potential to disrupt the endocrine systems of wildlife and humans. In this study, the probabilistic neural network (PNN), support vector machine (SVM), and learning vector quantization (LVQ), three types of machine learning, were used to develop binary classification models to predict androgenicity directly from the organic compounds' molecular structures which were represented by only eleven numerical descriptors. The PNN model acquired the best overall classification rate of 86.67% for prediction data set, with Matthews Correlation Coefficient of 0.64, and the LVQ model gave the lowest false negative rate of 0.00%, which will tend to give relatively high priority during toxicology evaluation. In addition, a consensus model was produced that integrated all three of the basic model types. Compared with the individual models, this consensus model correctly predicted the androgenicity of 86.67% of the prediction set compounds, with false negative rate of 0.00% and the highest Matthews Correlation Coefficient of 0.65. The obtained results indicate that the proposed classification models could provide a feasible and practical tool for the rapid screening of potential androgens.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[3]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[4]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[5]  Jorma Laaksonen,et al.  LVQ_PAK: The Learning Vector Quantization Program Package , 1996 .

[6]  Z R Li,et al.  Prediction of genotoxicity of chemical compounds by statistical learning methods. , 2005, Chemical research in toxicology.

[7]  Richard G. Brereton,et al.  Learning Vector Quantization for Multiclass Classification: Application to Characterization of Plastics , 2007, J. Chem. Inf. Model..

[8]  Paola Gramatica,et al.  In silico screening of estrogen-like chemicals based on different nonlinear classification models. , 2007, Journal of molecular graphics & modelling.

[9]  Peter C Jurs,et al.  Predicting the genotoxicity of thiophene derivatives from molecular structure. , 2003, Chemical research in toxicology.

[10]  T. Torroba,et al.  Heterocyclic Chemistry of Sulfur Chlorides – Fast Ways to Complex Heterocycles , 2006 .

[11]  Eamonn F. Healy,et al.  Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model , 1985 .

[12]  S. Vilar,et al.  Probabilistic neural network model for the in silico evaluation of anti-HIV activity and mechanism of action. , 2006, Journal of medicinal chemistry.

[13]  P Dardenne,et al.  Classification of modified starches by fourier transform infrared spectroscopy using support vector machines. , 2005, Journal of agricultural and food chemistry.

[14]  Jaina Mistry,et al.  A rapid computational filter for cytochrome P450 1A2 inhibition potential of compound libraries. , 2005, Journal of medicinal chemistry.

[15]  Anton J. Hopfinger,et al.  Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[16]  Markus A Lill,et al.  Impact of induced fit on ligand binding to the androgen receptor: a multidimensional QSAR study to predict endocrine-disrupting effects of environmental chemicals. , 2005, Journal of medicinal chemistry.

[17]  Pascal Boilot,et al.  Electronic noses inter-comparison, data fusion and sensor selection in discrimination of standard fruit solutions , 2003 .

[18]  Weida Tong,et al.  Study of 202 natural, synthetic, and environmental chemicals for binding to the androgen receptor. , 2003, Chemical research in toxicology.

[19]  Peter C Jurs,et al.  Predicting the genotoxicity of polycyclic aromatic compounds from molecular structure with different classifiers. , 2003, Chemical research in toxicology.