Design of Chinese Text Categorization Classifier Based on Attribute Bagging

In order to improve the precise rate and recall rate of Chinese text classifier, an improved bagging algorithm – attribute bagging is used in this paper. Document is represented by vector space model and Information Gain is used to do the feature selection. Re-sampling attributes is used to get multiple training sets and the kNN is selected as the individual classifier. The classification result is attained by voting. Experiments show that the attribute bagging gets lower errors and better performance than bagging and kNN in Chinese text categorization.