论文信息 - Improved feature selection approach TFIDF in text mining

Improved feature selection approach TFIDF in text mining

This paper describes the feature selection method TFIDF (term frequency, inverse document frequency). With it, we process the data resource and set up the vector space model in order to provide a convenient data structure for text categorization. We calculate the precision of this method with the help of categorization results. According to the empirical results, we analyze its advantages and disadvantages and present a new TFIDF-based feature selection approach to improve its accuracy.

[1] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[2] Thorsten Joachims,et al. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.