论文信息 - Naïve Bayes text classification with positive features selected by statistical method

Naïve Bayes text classification with positive features selected by statistical method

Text Classification is enduring to be one of the most researched problems due to continuously-increasing amount of electronic documents and digital data. Naïve Bayes is an effective and a simple classifier for data mining tasks, but does not show much satisfactory results in automatic text classification problems. In this paper, the performance of Naïve Bayes classifier is analyzed by training the classifier with only the positive features selected by CHIR, a statistics based method as input. Feature selection is the most important preprocessing step that improves the efficiency and accuracy of text classification algorithms by removing redundant and irrelevant terms from the training corpus. Experiments were conducted for randomly selected training sets and the performance of the classifier with words as features was analyzed. The proposed method achieves higher classification accuracy compared to other native methods for the 20Newsgroup benchmark.

K. R. Chandran | M. Janaki Meena | M. Janaki Meena | K. R. Chandran

[1] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[2] Rohini K. Srihari,et al. Feature selection for text categorization on imbalanced data , 2004, SKDD.

[3] Yiming Yang,et al. Text categorization , 2008, Scholarpedia.

[4] Andrew McCallum,et al. A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[5] Soon Myoung Chung,et al. Text Clustering with Feature Selection by Using Statistical Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[6] Hisham Al-Mubaid,et al. A New Text Categorization Technique Using Distributional Clustering and Learning Logic , 2006, IEEE Transactions on Knowledge and Data Engineering.

[7] Hae-Chang Rim,et al. Some Effective Techniques for Naive Bayes Text Classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8] Jason D. M. Rennie. Improving multi-class text classification with Naive Bayes , 2001 .

[9] Paul Dixon,et al. Oracle at TREC 10: Filtering and Question-Answering , 2001, TREC.

[10] Pedro M. Domingos,et al. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[11] Ciya Liao,et al. Feature Preparation in Text Categorization , 2003 .

[12] David R. Karger,et al. Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[13] Vangelis Metsis,et al. Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.