In silico Prediction of Androgenic and Nonandrogenic Compounds Using Random Forest

The purpose of the present study was to develop in silico models allowing for a reliable prediction of androgenic and nonandrogenic compounds based on a large diverse dataset of 205 compounds. As a new classification method, the Random Forest (RF) was applied, its performance to classify these compounds in terms of their Quantitative Structure–Activity Relationships (QSAR) was evaluated and also compared with the widely used Partial Least Squares (PLS) analysis for the dataset. The predictive power of these methods was verified with five-fold cross-validation and an independent test set. For the RF model, the prediction accuracies of the androgenic and nonandrogenic compounds are 81.0 and 77.0% for cross-validation, respectively, averaging 87.3% of correctly classified compounds in the external tests. The PLS is slightly weak, showing an average prediction accuracy of 75 and 74.7% for the cross-validation and external validation, respectively. Our analysis demonstrates that RF is a powerful tool capable of building models for the data and should be valuable for virtual screening of androgen receptor-binding ligands.

[1]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[2]  S. Tenbaum,et al.  Nuclear receptors: structure, function and involvement in disease. , 1997, The international journal of biochemistry & cell biology.

[3]  Bin Wang,et al.  An In Silico Method for Screening Nicotine Derivatives as Cytochrome P450 2A6 Selective Inhibitors Based on Kernel Partial Least Squares , 2007, International Journal of Molecular Sciences.

[4]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[5]  M. Haussler,et al.  Steroid hormone receptors: Evolution, ligands, and molecular basis of biologic function , 1999, Journal of cellular biochemistry.

[6]  D. Crews,et al.  Endocrine Disruptors: Present Issues, Future Directions , 2000, The Quarterly Review of Biology.

[7]  D. Fry Reproductive effects in birds exposed to pesticides and industrial chemicals. , 1995, Environmental health perspectives.

[8]  A. Vedani,et al.  In silico prediction of harmful effects triggered by drugs and chemicals. , 2005, Toxicology and applied pharmacology.

[9]  H. Fang,et al.  Comparative molecular field analysis (CoMFA) model using a large diverse set of natural, synthetic and environmental chemicals for binding to the androgen receptor , 2003, SAR and QSAR in environmental research.

[10]  Jure Zupan,et al.  Kohonen and counterpropagation artificial neural networks in analytical chemistry , 1997 .

[11]  Thomas Steger-Hartmann,et al.  Use of computer-assisted prediction of toxic effects of chemical substances. , 2006, Toxicology.

[12]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[13]  Yan Li,et al.  Comparison of steroid substrates and inhibitors of P-glycoprotein by 3D-QSAR analysis , 2005 .

[14]  Ling Yang,et al.  Classification of Substrates and Inhibitors of P-Glycoprotein Using Unsupervised Machine Learning Approach , 2005, J. Chem. Inf. Model..

[15]  T. Colborn Commentary: Environmental Estrogens: Health Implications for Humans and Wildlife , 1995 .

[16]  Masahiro Takeyoshi,et al.  Screening for androgen receptor activities in 253 industrial chemicals by in vitro reporter gene assays using AR-EcoScreen cells. , 2005, Toxicology in vitro : an international journal published in association with BIBRA.

[17]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[18]  Ling Yang,et al.  An in silico approach for screening flavonoids as P-glycoprotein inhibitors based on a Bayesian-regularized neural network , 2005, J. Comput. Aided Mol. Des..

[19]  Weida Tong,et al.  Study of 202 natural, synthetic, and environmental chemicals for binding to the androgen receptor. , 2003, Chemical research in toxicology.

[20]  W. Welsh,et al.  Computational models for predicting the binding affinities of ligands for the wild-type androgen receptor and a mutated variant associated with human prostate cancer. , 2003, Chemical Research in Toxicology.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  N Dubin,et al.  Blood levels of organochlorine residues and risk of breast cancer. , 1993, Journal of the National Cancer Institute.

[23]  T. Zacharewski In Vitro Bioassays for Assessing Estrogenic Substances , 1997 .

[24]  T. Colborn,et al.  Environmental estrogens: health implications for humans and wildlife. , 1995, Environmental health perspectives.

[25]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[26]  Juan J Perez,et al.  Managing molecular diversity. , 2005, Chemical Society reviews.

[27]  Riccardo Leardi,et al.  Application of genetic algorithm–PLS for feature selection in spectral data sets , 2000 .

[28]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[29]  C. Sultan,et al.  Molecular action of androgens , 2002, Molecular and Cellular Endocrinology.

[30]  Yan Li,et al.  Modeling K(m) values using electrotopological state: substrates for cytochrome P450 3A4-mediated metabolism. , 2005, Bioorganic & medicinal chemistry letters.

[31]  A. Richard,et al.  Interaction of organophosphate pesticides and related compounds with the androgen receptor. , 2002, Environmental health perspectives.

[32]  Duane D. Miller,et al.  A ligand-based approach to identify quantitative structure-activity relationships for the androgen receptor. , 2004, Journal of medicinal chemistry.

[33]  Edward F. Orlando,et al.  Effects of environmental antiandrogens on reproductive development in experimental animals , 2001 .

[34]  M. Barker,et al.  Partial least squares for discrimination , 2003 .