Generating Different Semantic Spaces for Document Classification

Document classification is an important technique in the field of digital library, WWW pages etc. Due to the problems of synonymy and polysemy, it is better to classify documents based on latent semantics. The local semantic basis, which contains the features of documents within a particular category, has more discriminate power and is more effective in classification than global semantic basis which contains the common features of all documents available. Because the semantic basis obtained by Nonnegative matrix factorization has a straightforward correspondence with samples while the semantic basis obtained by Singular value decomposition doesn’t, NMF is suitable to obtain the local semantic basis. In this paper, global and local semantic bases obtained by SVD and NMF are compared. The experimental results show that the best classification accuracy is achieved by local semantic basis obtained by NMF.