Arabic text classification methods: Systematic literature review of primary studies

Recent research on Big Data proposed and evaluated a number of advanced techniques to gain meaningful information from the complex and large volume of data available on the World Wide Web. To achieve accurate text analysis, a process is usually initiated with a Text Classification (TC) method. Reviewing the very recent literature in this area shows that most studies are focused on English (and other scripts) while attempts on classifying Arabic texts remain relatively very limited. Hence, we intend to contribute the first Systematic Literature Review (SLR) utilizing a search protocol strictly to summarize key characteristics of the different TC techniques and methods used to classify Arabic text, this work also aims to identify and share a scientific evidence of the gap in current literature to help suggesting areas for further research. Our SLR explicitly investigates empirical evidence as a decision factor to include studies, then conclude which classifier produced more accurate results. Further, our findings identify the lack of standardized corpuses for Arabic text; authors compile their own, and most of the work is focused on Modern Arabic with very little done on Colloquial Arabic despite its wide use in Social Media Networks such as Twitter. In total, 1464 papers were surveyed from which 48 primary studies were included and analyzed.

[1]  S. Khoja,et al.  APT: Arabic Part-of-speech Tagger , 2001 .

[2]  Lei Xu,et al.  A theoretical investigation of several model selection criteria for dimensionality reduction , 2012, Pattern Recognit. Lett..

[3]  Fouzi Harrag,et al.  Comparing Dimension Reduction Techniques for Arabic Text Classification Using BPNN Algorithm , 2010, 2010 First International Conference on Integrated Intelligent Computing.

[4]  Eiman Tamah Al-Shammari Improving Arabic document categorization: Introducing local stem , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.

[5]  Riyad Al-Shalabi,et al.  A comparison of text-classification techniques applied to Arabic text , 2009, J. Assoc. Inf. Sci. Technol..

[6]  Riyad Al-Shalabi,et al.  Improving KNN Arabic Text Classification with N-Grams Based Document Indexing , 2008 .

[7]  Edward A. Fox,et al.  Automated arabic text classification with P‐Stemmer, machine learning, and a tailored news article taxonomy , 2016, J. Assoc. Inf. Sci. Technol..

[8]  Ahmed Guessoum,et al.  A hybrid BSO-Chi2-SVM approach to Arabic text categorization , 2013, 2013 ACS International Conference on Computer Systems and Applications (AICCSA).

[9]  Tao Xiang,et al.  Finding Rare Classes: Active Learning with Generative and Discriminative Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[10]  Abdulmohsen Al-Thubaity,et al.  Weirdness Coefficient as a Feature Selection Method for Arabic Special Domain Text Classification , 2012, 2012 International Conference on Asian Language Processing.

[11]  Jafar Ababneh,et al.  Vector Space Models to Classify Arabic Text , 2014 .

[12]  Rehab M. Duwairi Statistical Feature Selection Techniques for Arabic Text Categorization , 2014 .

[13]  Mohammad S. Khorsheed,et al.  Comparative evaluation of text classification techniques using a large diverse Arabic dataset , 2013, Language Resources and Evaluation.

[14]  Abdulmohsen Al-Thubaity,et al.  The Effect of Combining Different Feature Selection Methods on Arabic Text Classification , 2013, 2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[15]  Alex Pentland,et al.  Discriminative, generative and imitative learning , 2002 .

[16]  Ma,et al.  STEMMING ALGORITHM TO CLASSIFY ARABIC DOCUMENTS , 2009 .

[17]  Laila Khreisat,et al.  A machine learning approach for Arabic text classification using N-gram frequency statistics , 2009, J. Informetrics.

[18]  Fouzi Harrag,et al.  Neural Network for Arabic text classification , 2009, 2009 Second International Conference on the Applications of Digital Information and Web Technologies.

[19]  Lisa Ballesteros,et al.  Light Stemming for Arabic Information Retrieval , 2007 .

[20]  Guanzheng Tan,et al.  The Effect of Preprocessing on Arabic Document Categorization , 2016, Algorithms.

[21]  Motaz Saad,et al.  Arabic text classification using decision trees , 2010 .

[22]  Reem Bin Hezam,et al.  Classifying Arabic web pages toolkit , 2012, WIMS '12.

[23]  Alaa El-Halees,et al.  A Comparative Study on Arabic Text Classification , 2008, Egypt. Comput. Sci. J..

[24]  Rehab Duwairi,et al.  Educative and Adaptive System for Personalized Learning: Learning Styles and Content Adaptation , 2007 .

[25]  S. M. Alsaleem,et al.  Neural networks for the automation of Arabic text categorization , 2013, 2013 International Conference on Computer Applications Technology (ICCAT).

[26]  Jaber Alwedyan,et al.  Categorize arabic data sets using multi-class classification based on association rule approach , 2011, ISWSA '11.

[27]  Mahmoud Al-Ayyoub,et al.  Automatic Arabic text categorization: A comprehensive comparative study , 2015, J. Inf. Sci..

[28]  Natheer Khasawneh,et al.  Feature reduction techniques for Arabic text categorization , 2009 .

[29]  Mofleh Al-Diabat,et al.  Arabic Text Categorization Using Classification Rule Mining , 2012 .

[30]  Mohamed El Bachir Menai,et al.  Naïve Bayes classifiers for authorship attribution of Arabic texts , 2014, J. King Saud Univ. Comput. Inf. Sci..

[31]  Rehab Duwairi,et al.  Arabic Text Categorization , 2007, Int. Arab J. Inf. Technol..

[32]  Zakaria Elberrichi,et al.  Arabic text categorization: a comparative study of different representation modes , 2012, Int. Arab J. Inf. Technol..

[33]  Saleh Alsaleem,et al.  Automated Arabic Text Categorization Using SVM and NB , 2011, Int. Arab. J. e Technol..

[34]  Amer Al-Badarneh,et al.  A comparison study of some Arabic root finding algorithms , 2010, J. Assoc. Inf. Sci. Technol..

[35]  Fekry Olayah,et al.  ARABIC TEXT CLASSIFICATION USING SMO, NAÏVE BAYESIAN, J48 ALGORITHMS , 2011 .

[36]  Alaa M. El-Halees,et al.  Arabic Text Classification Using Maximum Entropy , 2015 .

[37]  Wa'el Musa Hadi,et al.  Performance of NB and SVM classifiers in Islamic Arabic data , 2010, ISWSA '10.

[38]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[39]  Kamel Smaïli,et al.  TR-Classifier and kNN Evaluation for Topic Identification tasks , 2010 .

[40]  Abdulmohsen Al-Thubaity,et al.  Automatic Arabic Text Classification , 2008 .

[41]  Martin Potthast,et al.  On Textual Analysis and Machine Learning for Cyberstalking Detection , 2016, Datenbank-Spektrum.

[42]  Moawia Elfaki Yahia Arabic text categorization based on rough set classification , 2011, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA).

[43]  Mohamed Ben Ahmed,et al.  Combining classifiers for supertagging Arabic texts , 2010, NLPKE.

[44]  Izzat Alsmadi,et al.  The Effect of Stemming on Arabic Text Classification: An Empirical Study , 2011, Int. J. Inf. Retr. Res..

[45]  Nazlia Omar,et al.  Arabic Part of speech Tagging using k-Nearest Neighbour and Naive Bayes Classifiers Combination , 2014, J. Comput. Sci..

[46]  Joseph Dichy,et al.  An Empirical Study on the Feature's Type Effect on the Automatic Classification of Arabic Documents , 2010, CICLing.

[47]  Waseem Al-Romimah,et al.  Support Vector Machine versus k-Nearest Neighbor for Arabic Text Classification , 2014 .

[48]  Bassam Al-Shargabi,et al.  A comparative study for Arabic text classification algorithms based on stop words elimination , 2011, ISWSA '11.

[49]  Abdulmohsen Al-Thubaity,et al.  KACST Arabic Text Classification Project: Overview and Preliminary Results , 2008 .

[50]  Jing-Hao Xue,et al.  Aspects of generative and discriminative classifiers , 2008 .

[51]  Abdelwadood Moh'd. Mesleh,et al.  Feature sub-set selection metrics for Arabic text classification , 2011, Pattern Recognit. Lett..

[52]  Fawaz S. Al-Anzi,et al.  Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing , 2017, J. King Saud Univ. Comput. Inf. Sci..

[53]  Djelloul Ziadi,et al.  Rational Kernels for Arabic Text Classification , 2013, SLSP.

[54]  Rehab Duwairi,et al.  Machine learning for Arabic text categorization , 2006, J. Assoc. Inf. Sci. Technol..

[55]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[56]  Ashraf Odeh,et al.  Arabic Text Categorization Algorithm using Vector Evaluation Method , 2015, ArXiv.

[57]  Fouzi Harrag,et al.  Improving Arabic Text Categorization Using Neural Network with SVD , 2010, J. Digit. Inf. Manag..

[58]  Ismail Hmeidi,et al.  Performance of KNN and SVM classifiers on full word Arabic articles , 2008, Adv. Eng. Informatics.

[59]  Rajat Raina,et al.  Classification with Hybrid Generative/Discriminative Models , 2003, NIPS.

[60]  Rehab Duwairi,et al.  A study of the effects of preprocessing strategies on sentiment analysis for Arabic text , 2014, J. Inf. Sci..

[61]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[62]  Eman Al-Thwaib,et al.  Summarization as Feature Selection for Arabic Text Classification , 2014 .

[63]  Kheireddine Abainia,et al.  Topic identification of Arabic noisy texts based on KNN , 2015, 2015 International Conference on Information and Communication Technology Research (ICTRC).

[64]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[65]  Upasana Pandey,et al.  A Survey on Text Classification Techniques for E-mail Filtering , 2010, 2010 Second International Conference on Machine Learning and Computing.

[66]  Mounir Zrigui,et al.  Arabic Text Classification Framework Based on Latent Dirichlet Allocation , 2012, J. Comput. Inf. Technol..

[67]  N. Omar,et al.  Automatic arabic Text Categorization using Bayesian learning , 2012, 2012 7th International Conference on Computing and Convergence Technology (ICCCT).

[68]  M. Hadni,et al.  A new and efficient stemming technique for Arabic Text Categorization , 2012, 2012 International Conference on Multimedia Computing and Systems.

[69]  Lorenzo Bruzzone,et al.  A novel classification technique based on progressive transductive SVM learning , 2014, Pattern Recognit. Lett..

[70]  Fawaz A. Al Zaghoul,et al.  Arabic Text Classification Based on Features Reduction Using Artificial Neural Networks , 2013, 2013 UKSim 15th International Conference on Computer Modelling and Simulation.

[71]  Fadi Thabtah,et al.  Comparison of rule based classification techniques for the Arabic textual data , 2011, International Symposium on Innovations in Information and Communications Technology.