Research on short text classification for web forum

The unique characteristic of short text makes short text classification quite different from traditional long text processing. The feature space of short text is so sparse, which makes it notoriously difficult to extract sufficient and effective features. In this paper, aiming to classify the short text on web forum accurately, a novel short-text-processing method based on semantic extension is introduced to enhance the content of the original short text, which effectively solves the problem of feature sparse. In addition, we put forward the concept of Key-Pattern (KP) and propose a new text feature representation approach based on KP, which extracts phrase with powerful semantic information as the text features. Traditional classifier model are applied to estimate the text's classification, experimental results show that the proposed method is effective to improve the accuracy and recall of short text classification.