GA Strategy for Variable Selection in QSAR Studies: Application of GA-Based Region Selection to a 3D-QSAR Study of Acetylcholinesterase Inhibitors

Comparative molecular field analysis (CoMFA) with partial least squares (PLS) is one of the most frequently used tools in three-dimensional quantitative structure-activity relationships (3D-QSAR) studies. Although many successful CoMFA applications have proved the value of this approach, there are some problems in its proper application. Especially, the inability of PLS to handle the low signal-to-noise ratio (sample-to-variable ratio) has attracted much attention from QSAR researchers as an exciting research target, and several variable selection methods have been proposed. More recently, we have developed a novel variable selection method for CoMFA modeling (GARGS: genetic algorithm-based region selection), and its utility has been demonstrated in the previous paper (Kimura, T., et al. J. Chem. Inf. Comput. Sci. 1998, 38, 276-282). The purpose of this study is to evaluate whether GARGS can pinpoint known molecular interactions in 3D space. We have used a published set of acetylcholinesterase (AChE) inhibitors as a test example. By applying GARGS to a data set of AChE inhibitors, several improved models with high internal prediction and low number of field variables were obtained. External validation was performed to select a final model among them. The coefficient contour maps of the final GARGS model were compared with the properties of the active site in AChE and the consistency between them was evaluated.

[1]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: GA-Based PLS Analysis of Calcium Channel Antagonists , 1997, J. Chem. Inf. Comput. Sci..

[2]  A Tropsha,et al.  Structure-based alignment and comparative molecular field analysis of acetylcholinesterase inhibitors. , 1996, Journal of medicinal chemistry.

[3]  G R Marshall,et al.  3D-QSAR: a current perspective. , 1995, Trends in pharmacological sciences.

[4]  R. Leardi Application of a genetic algorithm to feature selection under full validation conditions and to outlier detection , 1994 .

[5]  C L Verlinde,et al.  Structure-based drug design: progress, results and challenges. , 1994, Structure.

[6]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: GA-Based Region Selection for CoMFA Modeling , 1998, J. Chem. Inf. Comput. Sci..

[7]  Hxugo Kubiny Variable Selection in QSAR Studies. I. An Evolutionary Algorithm , 1994 .

[8]  K. Funatsu,et al.  GA strategy for variable selection in QSAR studies: GAPLS and D-optimal designs for predictive QSAR model , 1998 .

[9]  M Pastor,et al.  Reliability of comparative molecular field analysis models: effects of data scaling and variable selection using a set of human synovial fluid phospholipase A2 inhibitors. , 1997, Journal of medicinal chemistry.

[10]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[11]  P. Goodford A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. , 1985, Journal of medicinal chemistry.

[12]  M Pastor,et al.  Smart region definition: a new way to improve the predictive ability and interpretability of three-dimensional quantitative structure-activity relationships. , 1997, Journal of medicinal chemistry.

[13]  Anton J. Hopfinger,et al.  Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[14]  A Tropsha,et al.  Cross-validated R2-guided region selection for comparative molecular field analysis: a simple method to achieve consistent results. , 1995, Journal of medicinal chemistry.

[15]  Ulf Norinder,et al.  Single and domain mode variable selection in 3D QSAR applications , 1996 .

[16]  J R Brown,et al.  Comparative computer graphics and solution studies of the DNA interaction of substituted anthraquinones based on doxorubicin and mitoxantrone. , 1985, Journal of medicinal chemistry.

[17]  Matthew Clark,et al.  The Probability of Chance Correlation Using Partial Least Squares (PLS) , 1993 .

[18]  Paul Geladi,et al.  Interactive variable selection (IVS) for PLS. Part 1: Theory and algorithms , 1994 .

[19]  G Klebe,et al.  On the prediction of binding properties of drug molecules by comparative molecular field analysis. , 1993, Journal of medicinal chemistry.

[20]  Ulf Norinder,et al.  3D‐QSAR investigation of the tripos benchmark steroids and some protein‐tyrosine kinase inhibitors of styrene type using the TDQ approach , 1996 .

[21]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[22]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[23]  G. Cruciani,et al.  Generating Optimal Linear PLS Estimations (GOLPE): An Advanced Chemometric Tool for Handling 3D‐QSAR Problems , 1993 .

[24]  J. Devillers,et al.  Genetic Algorithms in Computer-Aided Molecular Design , 1996 .