Genetic Algorithms in Feature Selection

Publisher Summary One of the main problems when elaborating large data sets is the detection of the relevant variables (i.e., the variables holding information) and the elimination of the noise. The goal of feature selection is the elimination of noise, together with the simplification of the mathematical model and to reduce, as much as possible, the number of variables involved. Genetic algorithms (GAs) can be very easily applied to feature selection. This chapter describes that very good results are obtained with a tailor-made GA configuration, in which the classical GA is slightly modified taking into account several peculiarities of feature selection problem. Hybrid algorithms are conceptually very simple: after a certain number of generations of genetic algorithms, the best experimental condition so far found undergoes a classical method of optimization (in the case of feature selection, stepwise selection); the results thus obtained can enter the population and then a new genetic algorithm is started with the updated population. This approach allows further improvement of the performance of the genetic algorithm. The application of genetic algorithms to two quantitative structure–activity relationship data sets is presented in the chapter, and the results are compared with those described in literature.