The Effect of Variable Selection on the Non‐linear Modelling of Oestrogen Receptor Binding

Oestrogen Receptor Binding Affinity (RBA) is often used as a measure of the oestrogenicity of endocrine disrupting chemicals. Quantitative Structure-Activity Relationship (QSAR) modelling of the binding affinities has been performed by three-dimensional approaches such as Comparative Molecular Field Analysis (CoMFA). Such techniques are restricted, however, for chemically diverse sets of chemicals as the alignment of molecules is complex. The aim of the present study was to use non-linear methods to model the RBA to the oestrogen receptor of a large diverse set of chemicals. To this end, various variable selection methods were applied to a large group of descriptors. The methods included stepwise regression, partial least squares and recursive partitioning (Formal Inference Based Recursive Modelling, FIRM). The selected descriptors were used in Counter-Propagation Neural Networks (CPNNs) and Support Vector Machines (SVMs) and the models were compared in terms of the predictivity of the activities of an external validation set. The results showed that although there was a certain degree of similarities between the structural descriptors selected by different methods, the predictive power of the CPNN and SVM models varied. Although the variables selected by stepwise regression led to poor CPNN models they resulted in the best SVM model in terms of predictivity. The parameters selected by some of the FIRM methods were superior in CPNN. © 2006 Wiley-VCH Verlag GmbH & Co. KGaA.

[1]  Daniel C. Weaver Applying data mining techniques to library design, lead generation and lead optimization. , 2004, Current opinion in chemical biology.

[2]  Anders Berglund,et al.  Alignment of flexible molecules at their receptor site using 3D descriptors and Hi-PCA , 1997, J. Comput. Aided Mol. Des..

[3]  Ruisheng Zhang,et al.  QSAR Models for the Prediction of Binding Affinities to Human Serum Albumin Using the Heuristic Method and a Support Vector Machine , 2004, J. Chem. Inf. Model..

[4]  Tudor I. Oprea,et al.  Ligand-based identification of environmental estrogens. , 1996, Chemical research in toxicology.

[5]  T. Wayne Schultz,et al.  Molecular Quantum Similarity Analysis of Estrogenic Activity , 2003, J. Chem. Inf. Comput. Sci..

[6]  Q Xie,et al.  Structure-activity relationships for a large diverse set of natural, synthetic, and environmental estrogens. , 2001, Chemical research in toxicology.

[7]  Weida Tong,et al.  Receptor-Mediated Toxicity: QSARs for Estrogen Receptor Binding and Priority Setting of Potential Estrogenic Endocrine Disruptors , 2004 .

[8]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[9]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[10]  Weida Tong,et al.  QSAR Models Using a Large Diverse Set of Estrogens , 2001, J. Chem. Inf. Comput. Sci..

[11]  Ulf Norinder,et al.  Support vector machine models in drug design: applications to drug transport processes and QSAR using simplex optimisations and variable selection , 2003, Neurocomputing.

[12]  D. Gonzalez-Arjona,et al.  Non-linear QSAR modeling by using multilayer perceptron feedforward neural networks trained by back-propagation. , 2002, Talanta.

[13]  Judith E. Dayhoff,et al.  Neural Network Architectures: An Introduction , 1989 .

[14]  Johann Gasteiger,et al.  Neural networks in chemistry and drug design , 1999 .

[15]  J. Sumpter,et al.  Estrogenicity of alkylphenolic compounds: A 3‐D structure—activity evaluation of gene activation , 2000 .

[16]  Effect of substituent size and dimensionality on potency of phenolic xenoestrogens evaluated with a recombinant yeast assay , 2000 .

[17]  Ş. Niculescu Artificial neural networks and genetic algorithms in QSAR , 2003 .

[18]  Weida Tong,et al.  Phytoestrogens and mycoestrogens bind to the rat uterine estrogen receptor. , 2002, The Journal of nutrition.

[19]  Lennart Eriksson,et al.  Model validation by permutation tests: Applications to variable selection , 1996 .

[20]  Marjan Vracko,et al.  Structure-mutagenicity modelling using counter propagation neural networks. , 2004, Environmental toxicology and pharmacology.

[21]  Jure Zupan,et al.  Study of structure–toxicity relationship by a counterpropagation neural network , 1999 .

[22]  Wolfgang Sippl,et al.  Binding affinity prediction of novel estrogen receptor ligands using receptor-based 3-D QSAR methods. , 2002, Bioorganic & medicinal chemistry.

[23]  Matthew Clark,et al.  The Probability of Chance Correlation Using Partial Least Squares (PLS) , 1993 .

[24]  M. Cronin,et al.  The Impact of variable selection on the modelling of oestrogenicity , 2005, SAR and QSAR in environmental research.

[25]  Joseph S. Verducci,et al.  On Combining Recursive Partitioning and Simulated Annealing To Detect Groups of Biologically Active Compounds , 2002, J. Chem. Inf. Comput. Sci..

[26]  Marjana Novic,et al.  Quantitative Structure-Activity Relationship of Flavonoid p56lck Protein Tyrosine Kinase Inhibitors. A Neural Network Approach , 1997, J. Chem. Inf. Comput. Sci..

[27]  D. Hawkins,et al.  Analysis of a Large Structure‐Activity Data Set Using Recursive Partitioning , 1997 .

[28]  Jure Zupan,et al.  Kohonen and counterpropagation artificial neural networks in analytical chemistry , 1997 .

[29]  Sean B. Holden,et al.  Support Vector Machines for ADME Property Classification , 2003 .

[30]  H Fang,et al.  The estrogen receptor relative binding affinities of 188 natural and xenochemicals: structural diversity of ligands. , 2000, Toxicological sciences : an official journal of the Society of Toxicology.

[31]  Martyn G. Ford,et al.  Unsupervised Forward Selection: A Method for Eliminating Redundant Variables , 2000, J. Chem. Inf. Comput. Sci..

[32]  A. Soto,et al.  Developmental effects of endocrine-disrupting chemicals in wildlife and humans. , 1993, Environmental health perspectives.

[33]  Dana Weekes,et al.  Evolutionary optimization, backpropagation, and data preparation issues in QSAR modeling of HIV inhibition by HEPT derivatives. , 2003, Bio Systems.

[34]  O Mekenyan,et al.  A computationally based identification algorithm for estrogen receptor ligands: part 1. Predicting hERalpha binding affinity. , 2000, Toxicological sciences : an official journal of the Society of Toxicology.

[35]  Subhash C. Basak,et al.  Modeling of structure-mutagenicity relationships: counter propagation neural network approach using calculated structural descriptors , 2004 .

[36]  Mark T D Cronin,et al.  Essential and desirable characteristics of ecotoxicity quantitative structure–activity relationships , 2003, Environmental toxicology and chemistry.

[37]  G. V. Kass,et al.  AUTOMATIC INTERACTION DETECTION , 1982 .

[38]  Z Daren,et al.  QSPR studies of PCBs by the combination of genetic algorithms and PLS analysis. , 2001, Computers & chemistry.

[39]  Hans-Dieter Höltje,et al.  Structure-based 3D-QSAR—merging the accuracy of structure-based alignments with the computational efficiency of ligand-based methods , 2000 .

[40]  Jürgen Bajorath,et al.  Recursive Median Partitioning for Virtual Screening of Large Databases , 2003, J. Chem. Inf. Comput. Sci..

[41]  James Devillers,et al.  Neural Networks in QSAR and Drug Design , 1996 .