Text classification presents difficult challenges due to the high dimensionality and sparsity of text data, and to the complex semantics of the natural language. Typically, in text classification the documents are represented in the vector space using the Bag of words (BoW) technique. Despite its ease of use, BoW representation does not consider the semantic similarity between words. In this paper, we overcome the shortage of the BoW approach by applying the exponential kernel, which models semantic similarity by means of a diffusion process on a graph defined by lexicon and co-occurrence information, to enrich the BoW representation. Combined with the support vector machine (SVM), experimental evaluation on real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the BoW approach.
[1]
Carlotta Domeniconi,et al.
Building semantic kernels for text classification using wikipedia
,
2008,
KDD.
[2]
Dongyan Zhao,et al.
Using Exponential Kernel for Word Sense Disambiguation
,
2013,
ICANN.
[3]
Nello Cristianini,et al.
Kernel Methods for Pattern Analysis
,
2003,
ICTAI.
[4]
Nello Cristianini,et al.
Learning Semantic Similarity
,
2002,
NIPS.
[5]
Nello Cristianini,et al.
Latent Semantic Kernels
,
2001,
Journal of Intelligent Information Systems.
[6]
Chih-Jen Lin,et al.
LIBLINEAR: A Library for Large Linear Classification
,
2008,
J. Mach. Learn. Res..
[7]
Iraklis Varlamis,et al.
A Knowledge-Based Semantic Kernel for Text Classification
,
2011,
SPIRE.