A Comparison of Text Classification Methods k-NN, Naïve Bayes, and Support Vector Machine for News Classification

In this era, a rapid thriving Internet occasionally complicates users to retrieve news category furthermore if there are plentiful of news to be categorized. News categorization is a technique can be used to retrieve a category of news which gives easiness for users. Internet has vast amounts of information especially at news. Therefore, accurate and speedy access is becoming ever more difficult. This paper compares a news categorization using k -Nearest Neighbor, Naive Bayes and Support Vector Machine. Using vary of variables and through a several steps of preprocessing which proving k-Nearest Neighbor is producing a capable accuracy competes with Support Vector Machine whereas Naive Bayes producing just an average result, not as good as k -Nearest Neighbor and Support Vector Machine yet as bad as k -Nearest Neighbor and Support Vector Machine ever reach. As the results, k -Nearest Neighbor using correlation measurement type produces the best result of this experiment.

[1]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[2]  M. E. Maron,et al.  On Relevance, Probabilistic Indexing and Information Retrieval , 1960, JACM.

[3]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[4]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Songbo Tan,et al.  An effective refinement strategy for KNN text classifier , 2006, Expert Syst. Appl..

[7]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[8]  Zhi-Hua Zhou,et al.  A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[9]  Shixiong Xia,et al.  An Improved KNN Text Classification Algorithm Based on Clustering , 2009, J. Comput..

[10]  Muhammad Rafi,et al.  Comparing SVM and naïve Bayes classifiers for text categorization with Wikitology as knowledge enrichment , 2011, 2011 IEEE 14th International Multitopic Conference.

[11]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[12]  Xu Xin,et al.  Advances in Machine Learning Based Text Categorization , 2006 .

[13]  Ali Selamat,et al.  Web news classification using neural networks based on PCA , 2002, Proceedings of the 41st SICE Annual Conference. SICE 2002..

[14]  Rong Jin,et al.  Understanding bag-of-words model: a statistical framework , 2010, Int. J. Mach. Learn. Cybern..

[15]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[16]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..