Recently two evolutionary strategies for the derivation of regression models, a genetic function approximation and the mutation/ selection algorithm MUSEUM have been described. The MUSEUM (Mutation and Selection Uncover Models) algorithm starts from a model containing randomly chosen variables. Random mutation, first by addition or elimination of only one or very few variables, afterwards by simultaneous random additions, eliminations and/or exchanges of several variables at a time, leads to new models which are evaluated by an appropriate fitness function. Only the “fittest” model is stored and used for further mutation and selection, leading to better and better models. However, the fitness of all models with up to three X variables can be determined much faster by calculation of the correlation coefficients ry.ij and ry.ijk from the partial correlation coefficients ryi, rij, ryj.j, rjk.i and ryk.ij. Using the Selwood data set (n = 31 compounds, k = 53 variables), it is demonstrated that systematic search is the best strategy for regression models with two or three X variables. The variables contained in the best three-variable models can be selected for further investigation, using the evolutionary approach. With the exception of complex models, containing six and more variables, nearly all relevant regression models are found by this combination of systematic search with the mutation/selection algorithm MUSEUM; the results are obtained in considerably shorter time than by including all variables in the calculations. In addition, systematic search is also a valuable tool for variable selection prior to stepwise regression and PLS analyses.
[1]
J. Topliss,et al.
Chance correlations in structure-activity studies using multiple regression analysis
,
1972
.
[2]
Stefan H. Unger,et al.
Model building in structure-activity relations. Reexamination of adrenergic blocking activity of .beta.-halo-.beta.-arylalkylamines
,
1973
.
[3]
J. Topliss,et al.
Chance factors in studies of quantitative structure-activity relationships.
,
1979,
Journal of medicinal chemistry.
[4]
D. Livingstone,et al.
Structure-activity relationships of antifilarial antimycin analogues: a multivariate pattern recognition study.
,
1990,
Journal of medicinal chemistry.
[5]
James H. Wikel,et al.
The use of neural networks for variable selection in QSAR
,
1993
.
[6]
Hxugo Kubiny.
Variable Selection in QSAR Studies. I. An Evolutionary Algorithm
,
1994
.
[7]
James W. McFarland,et al.
On Identifying Likely Determinants of Biological Activity in High Dimensional QSAR Problems
,
1994
.
[8]
Anton J. Hopfinger,et al.
Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships
,
1994,
J. Chem. Inf. Comput. Sci..