Performance of KNN and SVM classifiers on full word Arabic articles

This paper reports a comparative study of two machine learning methods on Arabic text categorization. Based on a collection of news articles as a training set, and another set of news articles as a testing set, we evaluated K nearest neighbor (KNN) algorithm, and support vector machines (SVM) algorithm. We used the full word features and considered the tf.idf as the weighting method for feature selection, and CHI statistics as a ranking metric. Experiments showed that both methods were of superior performance on the test corpus while SVM showed a better micro average F1 and prediction time.

[1]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[2]  Laila Khreisat,et al.  Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study , 2006, DMIN.

[3]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[4]  Ah-Hwee Tan,et al.  A Comparative Study on Chinese Text Categorization Methods , 2000, PRICAI Workshop on Text and Web Mining.

[5]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[6]  Alaa M. El-Halees,et al.  Arabic Text Classification Using Maximum Entropy , 2015 .

[7]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[8]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[9]  Wai Lam,et al.  Using a generalized instance set for automatic text categorization , 1998, SIGIR '98.

[10]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[11]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[12]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  John A. Kapenga,et al.  Computing in the 90's: The First Great Lakes Computer Science Conference, Kalamazoo Michigan, USA, October 18-20, 1989. Proceedings , 1991 .