Predictive Modeling of Estrogen Receptor Binding Agents Using Advanced Cheminformatics Tools and Massive Public Data

Estrogen receptors (ERα) are a critical target for drug design as well as a potential source of toxicity when activated unintentionally. Thus, evaluating potential ERα binding agents is critical in both drug discovery and chemical toxicity areas. Using computational tools, e.g., Quantitative Structure-Activity Relationship (QSAR) models, can predict potential ERα binding agents before chemical synthesis. The purpose of this project was to develop enhanced predictive models of ERα binding agents by utilizing advanced cheminformatics tools that can integrate publicly available bioassay data. The initial ERα binding agent data set, consisting of 446 binders and 8307 non-binders, was obtained from the Tox21 Challenge project organized by the NIH Chemical Genomics Center (NCGC). After removing the duplicates and inorganic compounds, this data set was used to create a training set (259 binders and 259 non-binders). This training set was used to develop QSAR models using chemical descriptors. The resulting models were then used to predict the binding activity of 264 external compounds, which were available to us after the models were developed. The cross-validation results of training set [Correct Classification Rate (CCR) = 0.72] were much higher than the external predictivity of the unknown compounds (CCR = 0.59). To improve the conventional QSAR models, all compounds in the training set were used to search PubChem and generate a profile of their biological responses across thousands of bioassays. The most important bioassays were prioritized to generate a similarity index that was used to calculate the biosimilarity score between each two compounds. The nearest neighbors for each compound within the set were then identified and its ERα binding potential was predicted by its nearest neighbors in the training set. The hybrid model performance (CCR = 0.94 for cross validation; CCR = 0.68 for external prediction) showed significant improvement over the original QSAR models, particularly for the activity cliffs that induce prediction errors. The results of this study indicate that the response profile of chemicals from public data provides useful information for modeling and evaluation purposes. The public big data resources should be considered along with chemical structure information when predicting new compounds, such as unknown ERα binding agents.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  C. Glass Differential recognition of target genes by nuclear receptor monomers, dimers, and heterodimers. , 1994, Endocrine reviews.

[3]  K. Umesono,et al.  The nuclear receptor superfamily: The second decade , 1995, Cell.

[4]  J. Gustafsson,et al.  Cloning of a novel receptor expressed in rat prostate and ovary. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  H Fang,et al.  The estrogen receptor relative binding affinities of 188 natural and xenochemicals: structural diversity of ligands. , 2000, Toxicological sciences : an official journal of the Society of Toxicology.

[7]  Alexander Tropsha,et al.  Novel Variable Selection Quantitative Structure-Property Relationship Approach Based on the k-Nearest-Neighbor Principle , 2000, J. Chem. Inf. Comput. Sci..

[8]  S. O. Mueller,et al.  Estrogen receptors and endocrine diseases: lessons from estrogen receptor knockout mice. , 2001, Current opinion in pharmacology.

[9]  K. Korach,et al.  The Multifaceted Mechanisms of Estradiol and Estrogen Receptor Signaling* , 2001, The Journal of Biological Chemistry.

[10]  J. Wood,et al.  Interaction of estrogen receptors α and β with estrogen response elements , 2001, Molecular and Cellular Endocrinology.

[11]  S. Cl,et al.  Intracellular signaling pathways: nongenomic actions of estrogens and ligand-independent activation of estrogen receptors. , 2001 .

[12]  S. Safe Transcriptional activation of genes by 17 beta-estradiol through estrogen receptor-Sp1 interactions. , 2001, Vitamins and hormones.

[13]  Weida Tong,et al.  Prediction of estrogen receptor binding for 58,000 chemicals using an integrated system of a tree-based model with structural alerts. , 2001, Environmental health perspectives.

[14]  Peter Dalgaard,et al.  Introductory statistics with R , 2002, Statistics and computing.

[15]  John D. Walker,et al.  Quantitative structure–activity relationships (QSARs) in toxicology: a historical perspective , 2003 .

[16]  Alexander Golbraikh,et al.  Rational selection of training and test sets for the development of validated QSAR models , 2003, J. Comput. Aided Mol. Des..

[17]  Yi Li,et al.  In silico ADME/Tox: why models fail , 2003, J. Comput. Aided Mol. Des..

[18]  Gilles Klopman,et al.  Structure-activity relationship study of a diverse set of estrogen receptor ligands (I) using MultiCASE expert system. , 2003, Chemosphere.

[19]  Heike Brand,et al.  Estrogen Receptor-α Directs Ordered, Cyclical, and Combinatorial Recruitment of Cofactors on a Natural Target Promoter , 2003, Cell.

[20]  Gilles Klopman,et al.  Screening of high production volume chemicals for estrogen receptor binding activity (II) by the MultiCASE expert system. , 2003, Chemosphere.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  G. Klopman,et al.  Searching for an Enhanced Predictive Tool for Mutagenicity , 2004, SAR and QSAR in environmental research.

[23]  T. Kodadek Faculty Opinions recommendation of Estrogen receptor-alpha directs ordered, cyclical, and combinatorial recruitment of cofactors on a natural target promoter. , 2004 .

[24]  Margaret Warner,et al.  Reflections on the Discovery and Significance of Estrogen Receptor (cid:1) , 2005 .

[25]  Nina Nikolova-Jeliazkova,et al.  QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review , 2005, Alternatives to laboratory animals : ATLA.

[26]  K. Korach,et al.  Lessons in estrogen biology from knockout and transgenic animals. , 2005, Annual review of physiology.

[27]  M. Nakajima,et al.  Cytochrome P450-mediated Metabolism of Estrogens and Its Regulation in Human , 2004 .

[28]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[29]  Adam Yasgar,et al.  Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[31]  Ann M Richard,et al.  Future of toxicology--predictive toxicology: An expanded view of "chemical toxicity". , 2006, Chemical research in toxicology.

[32]  Alexander Golbraikh,et al.  Predictive QSAR modeling workflow, model applicability domains, and virtual screening. , 2007, Current pharmaceutical design.

[33]  R Serafimova,et al.  QSAR and mechanistic interpretation of estrogen receptor binding , 2007, SAR and QSAR in environmental research.

[34]  V. Craig Jordan,et al.  SERMs for the treatment and prevention of breast cancer , 2007, Reviews in Endocrine and Metabolic Disorders.

[35]  Paola Gramatica,et al.  Evaluation and QSAR modeling on multiple endpoints of estrogen activity based on different bioassays. , 2008, Chemosphere.

[36]  Ivan Rusyn,et al.  The Use of Cell Viability Assay Data Improves the Prediction Accuracy of Conventional Quantitative Structure Activity Relationship Models of Animal Carcinogenicity , 2007 .

[37]  D. Young,et al.  Are the Chemical Structures in Your QSAR Correct , 2008 .

[38]  Igor V. Tetko,et al.  Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis , 2008, J. Chem. Inf. Model..

[39]  Alexander Golbraikh,et al.  QSAR Modeling of the Blood–Brain Barrier Permeability for Diverse Organic Compounds , 2008, Pharmaceutical Research.

[40]  Romualdo Benigni,et al.  Predictivity and Reliability of QSAR Models: The Case of Mutagens and Carcinogens , 2008, Toxicology mechanisms and methods.

[41]  Stephen R. Johnson,et al.  The Trouble with QSAR (or How I Learned To Stop Worrying and Embrace Fallacy) , 2008, J. Chem. Inf. Model..

[42]  T Scior,et al.  How to recognize and workaround pitfalls in QSAR studies: a critical review. , 2009, Current medicinal chemistry.

[43]  Alexander Tropsha,et al.  Quantitative structure-activity relationship modeling of rat acute toxicity by oral exposure. , 2009, Chemical research in toxicology.

[44]  Paola Gramatica,et al.  The importance of molecular structures, endpoints’ values, and predictivity parameters in QSAR research: QSAR analysis of a series of estrogen receptor binders , 2010, Molecular Diversity.

[45]  Worth Andrew,et al.  Review of QSAR Models and Software Tools for predicting Developmental and Reproductive Toxicity , 2010 .

[46]  Alexander Tropsha,et al.  Chembench: a cheminformatics workbench , 2010, Bioinform..

[47]  M. Taha,et al.  Pharmacophore and QSAR modeling of estrogen receptor beta ligands and subsequent validation and in silico search for new hits. , 2010, Journal of molecular graphics & modelling.

[48]  P. Willett,et al.  Promoting Access to White Rose Research Papers Similarity-based Virtual Screening Using 2d Fingerprints , 2022 .

[49]  J. Bailar,et al.  Toxicity Testing in the 21st Century: A Vision and a Strategy , 2010, Journal of toxicology and environmental health. Part B, Critical reviews.

[50]  Lei Xu,et al.  The EDKB: an established knowledge base for endocrine disrupting chemicals , 2010, BMC Bioinformatics.

[51]  Bruce Blumberg,et al.  Endocrine disrupting chemicals and disease susceptibility , 2011, The Journal of Steroid Biochemistry and Molecular Biology.

[52]  M. Stallcup,et al.  A Distinct Mechanism for Coactivator versus Corepressor Function by Histone Methyltransferase G9a in Transcriptional Regulation* , 2011, The Journal of Biological Chemistry.

[53]  I. Rusyn,et al.  Use of in Vitro HTS-Derived Concentration–Response Data as Biological Descriptors Improves the Accuracy of QSAR Models of in Vivo Toxicity , 2010, Environmental health perspectives.

[54]  Wei Xu,et al.  Endocrine disrupting chemicals targeting estrogen receptor signaling: identification and mechanisms of action. , 2011, Chemical research in toxicology.

[55]  Marlene T. Kim,et al.  Predicting chemical ocular toxicity using a combinatorial QSAR approach. , 2012, Chemical research in toxicology.

[56]  M. Raginsky BINARY CLASSIFICATION , 2013 .

[57]  A. Vedani,et al.  VirtualToxLab - a platform for estimating the toxic potential of drugs, chemicals and natural products. , 2012, Toxicology and applied pharmacology.

[58]  Ivan Rusyn,et al.  Identification of putative estrogen receptor-mediated endocrine disrupting chemicals using QSAR- and structure-based virtual screening approaches. , 2013, Toxicology and applied pharmacology.

[59]  Richard S. Judson,et al.  Binary Classification of a Large Collection of Environmental Chemicals from Estrogen Receptor Assays by Quantitative Structure-Activity Relationship and Machine Learning Methods , 2013, J. Chem. Inf. Model..

[60]  Marlene T. Kim,et al.  Critical Evaluation of Human Oral Bioavailability for Pharmaceutical Drugs by Using Various Cheminformatics Approaches , 2013, Pharmaceutical Research.

[61]  Alexander Golbraikh,et al.  Integrative chemical-biological read-across approach for chemical hazard classification. , 2013, Chemical research in toxicology.

[62]  M. Cordeiro,et al.  Prediction of the Estrogen Receptor Binding Affinity for both hER α and hER β by QSAR Approaches , 2014 .

[63]  Hao Zhu,et al.  Big Data in Chemical Toxicity Research: The Use of High-Throughput Screening Assays To Identify Potential Toxicants , 2014, Chemical research in toxicology.

[64]  Hao Zhu,et al.  Profiling Animal Toxicants by Automatically Mining Public Bioassay Data: A Big Data Approach for Computational Toxicology , 2014, PloS one.

[65]  Richard A Becker,et al.  Read-across approaches--misconceptions, promises and challenges ahead. , 2014, ALTEX.

[66]  John B. O. Mitchell Machine learning methods in chemoinformatics , 2014, Wiley interdisciplinary reviews. Computational molecular science.

[67]  Maykel Cruz-Monteagudo,et al.  Activity cliffs in drug discovery: Dr Jekyll or Mr Hyde? , 2014, Drug discovery today.

[68]  Marlene T. Kim,et al.  Developing Enhanced Blood–Brain Barrier Permeability Models: Integrating External Bio-Assay Data in QSAR Modeling , 2015, Pharmaceutical Research.

[69]  Weida Tong,et al.  Development and Validation of Decision Forest Model for Estrogen Receptor Binding Prediction of Chemicals Using Large Data Sets. , 2015, Chemical research in toxicology.

[70]  Ruili Huang,et al.  Mechanism Profiling of Hepatotoxicity Caused by Oxidative Stress Using Antioxidant Response Element Reporter Gene Assay Models and Big Data , 2015, Environmental health perspectives.