A Study of Text Classification Based on Concept Space

Following the expanding of VSM and LSI, a text classification based on Concept Space is proposed in this paper. Information gaining is applied to acquire concepts based on large training set. Concept Space is built by acquiring latent semantic indexing data, building a latent semantic space by LSI, and then adding the class-basis vector. The calculating method of the word-similarity, the text-similarity, the similarity of the text vector and the class-basis vector in Concept Space are presented. Experiment results show the Concept Space method is superior to Vector Space Model. This paper also discusses the future work-the problem of concept space learning.