An effective class-centroid-based dimension reduction method for text classification
暂无分享,去创建一个
Motivated by the effectiveness of centroid-based text classification techniques, we propose a classification-oriented class-centroid-based dimension reduction (DR) method, called CentroidDR. Basically, CentroidDR projects high-dimensional documents into a low-dimensional space spanned by class centroids. On this class-centroid-based space, the centroid-based classifier essentially becomes CentroidDR plus a simple linear classifier. Other classification techniques, such as K-Nearest Neighbor (KNN) classifiers, can be used to replace the simple linear classifier to form much more effective text classification algorithms. Though CentroidDR is simple, non-parametric and runs in linear time, preliminary experimental results show that it can improve the accuracy of the classifiers and perform better than general DR methods such as Latent Semantic Indexing (LSI).
[1] Wei-Ying Ma,et al. Supervised latent semantic indexing for document categorization , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).
[2] Kan Li,et al. Text Categorization Based on Topic Model , 2008, RSKT.
[3] Shengyi Jiang,et al. An improved K-nearest-neighbor algorithm for text categorization , 2012, Expert Syst. Appl..
[4] George Karypis,et al. Centroid-Based Document Classification: Analysis and Experimental Results , 2000, PKDD.