A novel feature voting model for text classification

Along with the information explosion in the Internet era, the traditional classification methods, such as KNN (k-nearest neighbor), Naive Bayes (NB), encounter bottlenecks due to the endless stream of new words. In this paper, through comparing with the Rocchio and Bayesian algorithms, it has been found that centroid-based algorithms are insufficient for text classification. Therefore, a novel feature voting model is proposed, which gives rise to a bag-of-words based feature voting algorithm for text classification. This algorithm assigns categories for each document according to the ranking of weighted sum of feature values. Experimental results have shown the efficiency of the proposed method over the other state-of-the-art methods.

[1]  Li Dan,et al.  Research of Text Categorization on WEKA , 2013, 2013 Third International Conference on Intelligent System Design and Engineering Applications.

[2]  Ana Margarida de Jesus,et al.  Improving Methods for Single-label Text Categorization , 2007 .

[3]  Anil K. Jain,et al.  Classification of text documents , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[4]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[5]  Hae-Chang Rim,et al.  Some Effective Techniques for Naive Bayes Text Classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[6]  Harold Borko,et al.  Automatic Document Classification , 1963, JACM.

[7]  Jian Su,et al.  Supervised and Traditional Term Weighting Methods for Automatic Text Categorization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[9]  Fabrizio Sebastiani,et al.  An Analysis of the Relative Hardness of Reuters-21578 Subsets , 2003 .

[10]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[11]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.