A Feature Selection Method Based on Genetic Algorithms

Feature extraction technology is a major factor in determining good classification results, the traditional feature extraction method has many deficiencies, such as when a high degree of imbalance in the distribution of the categories and characteristics, it can not effectively deal with low-frequency words; single feature for improper handling, leading to local optima generating solution. For traditional feature extraction methods can not fully and effectively examine the shortcomings of the candidate feature words, proposed a text feature extraction method based on genetic algorithm. In this method, a variety of heuristics word frequency, correlation, part of speech, and location to be elected to the comprehensive test features, and to optimize the weight parameter for each heuristic using genetic algorithms. By comparing the different test sets, the experimental results show that, compared with traditional methods, this method can effectively avoid the traditional feature extraction method produces bias, obtain a representative set of features, making this method has some practical value. Keywords-feature extraction technology; classification; genetic algorithms; word frequency; feature set