Enhanced Centroid-Based Classification Technique by Filtering Outliers

Document clustering or unsupervised document classification has been used to enhance information retrieval Recently this has become an intense area of research due to its practical importance Outliers are the elements whose similarity to the centroid of the corresponding category is below some threshold value In this paper, we show that excluding outliers from the noisy training data significantly improves the performance of the centroid-based classifier which is the best known method The proposed method performs about 10% better than the centroid-based classifier.