Linguistic Techniques to Improve the Performance of Automatic Text Categorization
暂无分享,去创建一个
This paper presents a method for incorporating natural language processing into existing text categorization procedures. Three aspects are considered in the investigation: (i) a method for weighting terms based on the concept of a probability weighted amount of information, (ii) estimation of term occurrence probabilities using a probabilistic language model, and (iii) automatic extraction of terms based on POS tags automatically generated by a morphological analyzer. The effects of these considerations are examined in the experiments using Reuters21578 and NTCIR-J1 standard test collections.
[1] Akiko Aizawa. The feature quantity: an information theoretic perspective of Tfidf-like measures , 2000, SIGIR '00.
[2] Yiming Yang,et al. A re-examination of text categorization methods , 1999, SIGIR '99.
[3] Yoram Singer,et al. Context-sensitive learning methods for text categorization , 1996, SIGIR '96.
[4] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.