A Feature Selection Method based on Improved TFIDF

Feature selection is a valid method to reduce the dimension of vector in text categorization system. After analyzed several common evaluation functions for feature selection, we applied terms weight function to feature selection. A new evaluation function based on improved TFIDF method is presented; in this function the category information is introduced to feature items, and the feature items of relevant categories are selected to make up the shortcomings of the TFIDF. Experiments proved that the method is simple and feasible. It's advantageous in improving the efficiency of the selected feature subset.