Toward an Optimal Procedure for Variable Selection and QSAR Model Building

In this work, we report the development of a novel QSAR technique combining genetic algorithms and neural networks for selecting a subset of relevant descriptors and building the optimal neural network architecture for QSAR studies. This technique uses a neural network to map the dependent property of interest with the descriptors preselected by the genetic algorithm. This technique differs from other variable selection techniques combining genetic algorithms to neural networks by two main features: (1) The variable selection search performed by the genetic algorithm is not constrained to a defined number of descriptors. (2) The optimal neural network architecture is explored in parallel with the variable selection by dynamically modifying the size of the hidden layer. By using both artificial data and real biological data, we show that this technique can be used to build both classification and regression models and outperforms simpler variable selection techniques mainly for nonlinear data sets. The results obtained on real data are compared to previous work using other modeling techniques. We also discuss some important issues in building QSAR models and good practices for QSAR studies.

[1]  Igor V. Tetko,et al.  Neural network studies, 1. Comparison of overfitting and overtraining , 1995, J. Chem. Inf. Comput. Sci..

[2]  Stephen H. Friend,et al.  Mining the NCI Anticancer Drug Discovery Databases: Genetic Function Approximation for the QSAR Study of Anticancer Ellipticine Analogues , 1998 .

[3]  K. Funatsu,et al.  GA strategy for variable selection in QSAR studies: GAPLS and D-optimal designs for predictive QSAR model , 1998 .

[4]  Igor V. Tetko,et al.  Neural Network Studies. 3. Variable Selection in the Cascade-Correlation Learning Architecture , 1998, J. Chem. Inf. Comput. Sci..

[5]  Igor V. Tetko,et al.  Neural Network Studies, 2. Variable Selection , 1996, J. Chem. Inf. Comput. Sci..

[6]  Peter C. Jurs,et al.  Prediction of Human Intestinal Absorption of Drug Compounds from Molecular Structure , 1998, J. Chem. Inf. Comput. Sci..

[7]  S. Wold,et al.  Comparative molecular field analysis , 1991 .

[8]  Ciamac C. Moallemi,et al.  Classifying cells for cancer diagnosis using neural networks , 1991, IEEE Expert.

[9]  Frank R. Burden,et al.  Atomistic topological indices applied to benzodiazepines using various regression methods , 1998 .

[10]  Michael Wiese,et al.  A Comparative Molecular Field Analysis of Inhibitors of Tubulin Polymerization , 2000 .

[11]  G H Loew,et al.  Theoretical structure-activity studies of benzodiazepine analogues. Requirements for receptor affinity and activity. , 1984, Molecular pharmacology.

[12]  Toshio Fujita,et al.  The Correlation of Biological Activity of Plant Growth Regulators and Chloromycetin Derivatives with Hammett Constants and Partition Coefficients , 1963 .

[13]  G M Crippen,et al.  Modeling the benzodiazepine receptor binding site by the general three-dimensional structure-directed quantitative structure-activity relationship method REMOTEDISC. , 1990, Molecular pharmacology.

[14]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: GA-Based PLS Analysis of Calcium Channel Antagonists , 1997, J. Chem. Inf. Comput. Sci..

[15]  Alexander Tropsha,et al.  Novel Variable Selection Quantitative Structure-Property Relationship Approach Based on the k-Nearest-Neighbor Principle , 2000, J. Chem. Inf. Comput. Sci..

[16]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[17]  Ajay,et al.  Designing libraries with CNS activity. , 1999, Journal of medicinal chemistry.

[18]  M. Karplus,et al.  Genetic neural networks for quantitative structure-activity relationships: improvements and application of benzodiazepine affinity for benzodiazepine/GABAA receptors. , 1996, Journal of medicinal chemistry.

[19]  Nenad Trinajstic,et al.  Multivariate Regression Outperforms Several Robust Architectures of Neural Networks in QSAR Modeling , 1999, J. Chem. Inf. Comput. Sci..

[20]  D. Maddalena,et al.  Prediction of receptor properties and binding affinity of ligands to benzodiazepine/GABAA receptors using artificial neural networks. , 1995, Journal of medicinal chemistry.

[21]  J. Topliss,et al.  CHANCE FACTORS IN STUDIES OF QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS , 1980 .

[22]  Chris L. Waller,et al.  Development and Validation of a Novel Variable Selection Technique with Application to Multidimensional Quantitative Structure-Activity Relationship Studies , 1999, J. Chem. Inf. Comput. Sci..

[23]  Ettore Novellino,et al.  Study of Benzodiazepines Receptor Sites Using a Combined QSAR‐CoMFA Approach , 1992 .

[24]  Frank R. Burden,et al.  Holographic QSAR of benzodiazepines , 1998 .

[25]  Raymond J. Mooney,et al.  An Experimental Comparison of Symbolic and Connectionist Learning Algorithms , 1989, IJCAI.

[26]  Sung Jin Cho,et al.  Rational Combinatorial Library Design. 2. Rational Design of Targeted Combinatorial Peptide Libraries Using Chemical Similarity Probe and the Inverse QSAR Approaches , 1998, J. Chem. Inf. Comput. Sci..

[27]  S. Kuo,et al.  Antitumor agents. 174. 2',3',4',5,6,7-Substituted 2-phenyl-1,8-naphthyridin-4-ones: their synthesis, cytotoxicity, and inhibition of tubulin polymerization. , 1997, Journal of medicinal chemistry.

[28]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[29]  J R Chretien,et al.  Estimation of blood-brain barrier crossing of drugs using molecular size and shape, and H-bonding descriptors. , 1998, Journal of drug targeting.

[30]  H Malmgren For classification and predictive purposes, simulated neural networks (SNNs; more often called artificial neural networks, ANNs) offer a powerful alternative to traditional statistical analyses. , 1999, Epilepsia.

[31]  Peter C. Jurs,et al.  Prediction of Hydroxyl Radical Rate Constants from Molecular Structure , 1999, J. Chem. Inf. Comput. Sci..

[32]  S. Kuo,et al.  Antitumor agents. 155. Synthesis and biological evaluation of 3',6,7-substituted 2-phenyl-4-quinolones as antimicrotubule agents. , 1994, Journal of medicinal chemistry.

[33]  U Norinder,et al.  Theoretical calculation and prediction of brain-blood partitioning of organic solutes using MolSurf parametrization and PLS statistics. , 1998, Journal of pharmaceutical sciences.

[34]  A. Seelig,et al.  Blood-Brain Barrier Permeation: Molecular Parameters Governing Passive Diffusion , 1998, The Journal of Membrane Biology.

[35]  Roelof F. Rekker,et al.  The hydrophobic fragmental constant, its derivation and application: A means of characterizing membrane systems , 1977 .

[36]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.