Abstract The traditional short-text classification's accuracy usually highly relies on statistical feature selection. Owing to the fact that short-text has inherent defects such as short length, weak signal and less features. It is hard to avoid noise words when doing feature extension which will highly influence the accuracy of classification. In order to solve the above problem, this paper proposes a semantic dictionary method for short-text classification. The method builds a set of domain dictionary by analyzing the specific characteristics in certain field. As each word's weight in the dictionary is designed according to the correlation between the word and the category, classification accuracy has improved to some extent. Then, in order to enhance dictionary vocabulary coverage, association rules are utilized to automatically extend semantic dictionary. Finally, an experiment based on micro-blog data is conducted which shows that the method has a good effect.
[1]
Wu Xi-hong.
Short-text Classification Method Based on Concept Network
,
2010
.
[2]
Xu Wei-ran.
Independent semantic feature extraction algorithm based on short text
,
2007
.
[3]
Song Han-tao,et al.
Feature Selection in Text Categorization
,
2004
.
[4]
David D. Lewis,et al.
Evaluating and optimizing autonomous text classification systems
,
1995,
SIGIR '95.
[5]
Fabrizio Sebastiani,et al.
Machine learning in automated text categorization
,
2001,
CSUR.
[6]
Wu Yu.
Short Text Classification Based on Domain Word Ontology
,
2009
.