Local Latent Semantic Analysis Based on Support Vector Machine for Imbalanced Text Categorization

Many text categorization tasks involve imbalanced training examples. We tackle this problem by using improved local Latent Semantic Analysis. LSA has been shown to be extremely useful but it is not an optimal representation for text categorization because this unsupervised method ignores class discrimination while only concentrating on representation. Some local LSI methods have been proposed to improve the classification by utilizing class discrimination information. In this paper, we choose support vector machine (SVM) to generate imbalanced dataset as the local regions for local LSA. Experimental results show that our method is better than global LSA and traditional local LSA methods on classification within a much smaller LSA dimension.