Study on Feature Selection in Chinese Text Categorization
暂无分享,去创建一个
This paper introduces and compares eight feature selection methods in text categorization. Among the eight methods, Multi Class Odds Ratio(MC OR), a variant of Odds Ratio which is often used in binary classification, and a new feature selection method based on Class Discriminating Words(CDW) are proposed. Combined with the classic VSM classifier based on cosine similarity and the Nave Bayes classifier, training and test are carried out on two text sets with different class distribution. As the results indicate, MC OR and CDW gain the best selecting effect.