Feature Selection and Linear/Nonlinear Regression Methods for the Accurate Prediction of Glycogen Synthase Kinase-3β Inhibitory Activities

Few variables were selected from a pool of calculated Dragon descriptors through three different feature selection methods, namely genetic algorithm (GA), successive projections algorithm (SPA), and fuzzy rough set ant colony optimization (fuzzy rough set ACO). Each set of selected descriptors was regressed against the bioactivities of a series of glycogen synthase kinase-3beta (GSK-3beta) inhibitors, through linear and nonlinear regression methods, namely multiple linear regression (MLR), artificial neural network (ANN), and support vector machines (SVM). The fuzzy rough set ACO/SVM-based model gave the best estimation/prediction results, demonstrating the nonlinear nature of this analysis and suggesting fuzzy rough set ACO, first introduced in chemistry here, as an improved variable selection method in QSAR for the class of GSK-3beta inhibitors.