Semantic Pattern Tree Kernels for Short-Text Classification

Kernel methods are widely used for document classification in diverse domains. Popular kernels such as bag-of-word kernels and tree kernels show satisfactory results in classifying documents such as articles, e-mails or web pages. However, they provide less satisfactory performances in classifying short-text documents since the short documents have insufficient feature space. In order to cope with the problem, this paper presents a novel kernel function called semantic pattern tree kernel for classifying short-text documents. The proposed kernel extends the feature space of each document by incorporating syntactic and semantic information using three levels of semantic annotations. Experiments on the Open Directory Project dataset show that in classifying short-text documents the semantic pattern tree kernels achieve higher accuracy than the conventional kernels.