An effective class-centroid-based dimension reduction method for text classification

Motivated by the effectiveness of centroid-based text classification techniques, we propose a classification-oriented class-centroid-based dimension reduction (DR) method, called CentroidDR. Basically, CentroidDR projects high-dimensional documents into a low-dimensional space spanned by class centroids. On this class-centroid-based space, the centroid-based classifier essentially becomes CentroidDR plus a simple linear classifier. Other classification techniques, such as K-Nearest Neighbor (KNN) classifiers, can be used to replace the simple linear classifier to form much more effective text classification algorithms. Though CentroidDR is simple, non-parametric and runs in linear time, preliminary experimental results show that it can improve the accuracy of the classifiers and perform better than general DR methods such as Latent Semantic Indexing (LSI).