Comparison of rule based classification techniques for the Arabic textual data

Text categorisation discipline has recently attracted many scholars because of the large number of documents on the World Wide Web (WWW) that contain hidden useful information which can be utilised by organisational's managers for decision making. However, the majority of research conducted in text categorisation is related to English data collections while there is limited research attempts conducted on mining corpuses in Arabic. This paper investigates the problem of Arabic text categorisation in order to measure the performance of different rule based classification data mining techniques. Precisely, four different rule based classification approaches: C4.5, RIPPER, PART, and OneRule are compared against the known CCA Arabic text data set. Experiments are carried out using a modified version of WEKA business intelligence tool, and the results determine that the least suitable classification algorithms for classifying Arabic texts is OneRule whereas RIPPER, C4.5 and PART have similar performance with respect to error rate.

[1]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[2]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[3]  Alaa M. El-Halees,et al.  Arabic Text Classification Using Maximum Entropy , 2015 .

[4]  Rehab Duwairi,et al.  Educative and Adaptive System for Personalized Learning: Learning Styles and Content Adaptation , 2007 .

[5]  Laila Khreisat,et al.  Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study , 2006, DMIN.

[6]  Fouzi Harrag,et al.  Neural Network for Arabic text classification , 2009, 2009 Second International Conference on the Applications of Digital Information and Web Technologies.

[7]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Thorsten Joachims,et al.  A statistical learning learning model of text classification for support vector machines , 2001, SIGIR '01.

[10]  Thorsten Joachims,et al.  A Statistical Learning Model of Text Classification for Support Vector Machines. , 2001, SIGIR 2002.

[11]  Rehab Duwairi,et al.  Arabic Text Categorization , 2007, Int. Arab J. Inf. Technol..

[12]  Abdulmohsen Al-Thubaity,et al.  Automatic Arabic Text Classification , 2008 .

[13]  Amine Bensaid,et al.  Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm , 2004 .

[14]  Min Song,et al.  Handbook of Research on Text and Web Mining Technologies , 2008 .

[15]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[16]  Fouzi Harrag,et al.  Improving arabic text categorization using decision trees , 2009, 2009 First International Conference on Networked Digital Technologies.

[17]  Yi Guo,et al.  Automatic text categorization based on content analysis with cognitive situation models , 2010, Inf. Sci..