Naïve Bayesian Based on Chi Square to Categorize Arabic Data

Text classification is a supervised technique that uses labelled training data to learn the classification system and then automatically classifies the remaining text using the learned system. This paper investigates Naive Bayesian algorithm based on Chi Square features selection method. The base of our comparisons are macro F1, macro recall and macro precision evaluation measures. The experimental results compared against different Arabic text categorization data sets provided evidence that feature selection often increases classification accuracy by removing rare terms.

[1]  Peter I. Cowling,et al.  MMAC: a new multi-class, multi-label associative classification approach , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[2]  Mohammed Benkhalifa,et al.  Integrating WordNet knowledge to supplement training data in semi‐supervised agglomerative hierarchical clustering for text categorization , 2001, Int. J. Intell. Syst..

[3]  Abdulmohsen Al-Thubaity,et al.  KACST Arabic Text Classification Project: Overview and Preliminary Results , 2008 .

[4]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[5]  Alaa M. El-Halees Mining Arabic Association Rules for Text Classification , 2006 .

[6]  Yaxin Bi,et al.  An kNN Model-Based Approach and Its Application in Text Categorization , 2004, CICLing.

[7]  Fadi Thabtah,et al.  VSMs with K-Nearest Neighbour to Categorise Arabic Text Data , 2008 .

[8]  Amine Bensaid,et al.  Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm , 2004 .

[9]  Alaa M. El-Halees,et al.  Arabic Text Classification Using Maximum Entropy , 2015 .

[10]  Abdelwadood Mesleh,et al.  Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System , 2007 .

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  Fabrizio Sebastiani,et al.  A Tutorial on Automated Text Categorisation , 2000 .

[13]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[14]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[15]  Laila Khreisat,et al.  Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study , 2006, DMIN.