论文信息 - Improving Naïve Bayes Text Classifiers with Incremental Feature Weighting

Improving Naïve Bayes Text Classifiers with Incremental Feature Weighting

In the real-world operational environment, most of text classification systems have the problems of insufficient training documents and no prior knowledge of feature space. In this regard, Naive Bayes is known to be an appropriate algorithm of operational text classification since the classification model can be evolved easily by incrementally updating its pre-learned classification model and feature space. This paper proposes the improving technique of Naive Bayes classifier through feature weighting strategy. The basic idea is that parameter estimation of Naive Bayes considers the degree of feature importance as well as feature distribution. We can develop a more accurate classification model by incorporating feature weights into Naive Bayes learning algorithm, not performing a learning process with a reduced feature set. In addition, we have extended a conventional feature update algorithm for incremental feature weighting in a dynamic operational environment. To evaluate the proposed method, we perform the experiments using the various document collections, and show that the traditional Naive Bayes classifier can be significantly improved by the proposed technique.

Jae-Young Chang | Han-Joon Kim

[1] Vipin Kumar,et al. Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[2] Bing Liu,et al. Identifying comparative sentences in text documents , 2006, SIGIR.

[3] Jugal K. Kalita,et al. Summarization as feature selection for text categorization , 2001, CIKM '01.

[4] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[5] Bing Liu,et al. Mining and summarizing customer reviews , 2004, KDD.

[6] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[7] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.