Arabic text stemming: Comparative analysis

Text classification is the most important research issues in the field of data mining. The main idea of using the stemming technique is to reduce the number of features that can be extracted from the document. Furthermore, the stemming aims to enhance the accuracy of the classifier. This paper aims to study the effectiveness of using stemming techniques. The paper will use two popular word extractions: Khoja and Light stemmers. The results will compare with the result of classification without using the technique of word extraction. In the experiment, the Sequential Minimal Optimization (SMO), Naive Bayesian (NB) J48 and K-nearest neighbors (KNN) were used to build the training models and test the data. By implement the two approaches of word extraction and measured the accuracy of them by precision, recall and f-measure, the results show that the Light stemmers outperforms the Khoja stemmer. Furthermore, the results were comparing with the results of classification without using stemming technique.

[1]  Krishna M. Sivalingam,et al.  Recent Trends in Image Processing and Pattern Recognition , 2018, Communications in Computer and Information Science.

[2]  Tarek F. Gharib,et al.  Arabic Text Classification Using Support Vector Machines , 2009, Int. J. Comput. Their Appl..

[3]  May Y. Al-Nashashibi,et al.  Stemming techniques for Arabic words: A comparative study , 2010, 2010 2nd International Conference on Computer Technology and Development.

[4]  Maya Ingle,et al.  Empirical Studies on Machine Learning Based Text Classification Algorithms , 2011 .

[5]  Abdelwadood Mesleh,et al.  Support Vector Machine Text Classifier for Arabic Articles , 2010 .

[6]  Zakaria Elberrichi,et al.  Arabic text categorization: a comparative study of different representation modes , 2012, Int. Arab J. Inf. Technol..

[7]  Hamdy M. Mousa,et al.  Arabic Text Categorization Using Mixed Words , 2016 .

[8]  Saleh Alsaleem,et al.  Automated Arabic Text Categorization Using SVM and NB , 2011, Int. Arab. J. e Technol..

[9]  Dharminder Kumar,et al.  DATA MINING CLASSIFICATION TECHNIQUES APPLIED FOR BREAST CANCER DIAGNOSIS AND PROGNOSIS , 2011 .

[10]  V Korde,et al.  TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY , 2012 .

[11]  Hanane Froud,et al.  A comparative study of root-based and stem-based approaches for measuring the similarity between arabic words for arabic text mining applications , 2012 .

[12]  Rehab Duwairi,et al.  Educative and Adaptive System for Personalized Learning: Learning Styles and Content Adaptation , 2007 .

[13]  Mohamed S. Abdel-Wahab,et al.  An Intelligent System For Arabic Text Categorization , 2006 .

[14]  Nayer M. Wanas,et al.  A Study of Text Preprocessing Tools for Arabic Text Categorization , 2009 .

[15]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[16]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[17]  Mohammad T. Alhawary Modern Standard Arabic Grammar: A Learner's Guide , 2011 .

[18]  Lisa Ballesteros,et al.  Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis , 2002, SIGIR '02.

[19]  S. H. Gawande,et al.  A Comparative Study on Different Types of Approaches to Text Categorization , 2012 .

[20]  Rajkumar Roy,et al.  TEXT CLASSIFICATION METHOD REVIEW , 2007 .

[21]  Kavi Narayana Murthy Advances in automatic text categorisation , 2003 .

[22]  Abdulmohsen Al-Thubaity,et al.  Automatic Arabic Text Classification , 2008 .

[23]  Jafar Ababneh,et al.  Vector Space Models to Classify Arabic Text , 2014 .

[24]  Abdellah Madani,et al.  New stemming for arabic text classification using feature selection and decision trees , 2014 .

[25]  Mohammed Naji AL-Kabi ARABIC ROOT BASED STEMMER , 2006 .

[26]  Mohammed A. Otair COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS , 2013 .

[27]  Tomasz Winiarski,et al.  Feature Selection Based on Information Theory Filters , 2003 .

[28]  Suleiman H. Mustafa Word Stemming for Arabic Information Retrieval: The Case for Simple Light Stemming , 2012 .

[29]  Anjali Ganesh Jivani,et al.  A Comparative Study of Stemming Algorithms , 2011 .

[30]  Hamouda Khalifa Hamouda Chantar,et al.  New techniques for Arabic document classification , 2013 .

[31]  Iqbal AbdulBaki Mohammad CATEGORIZATION USING N-GRAM FREQUENCY STATISTICS , 2013 .

[32]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[33]  Lisa Ballesteros,et al.  Light Stemming for Arabic Information Retrieval , 2007 .

[34]  Abdelmonaime Lachkar,et al.  Effective Arabic Stemmer Based Hybrid Approach for Arabic Text Categorization , 2013 .

[35]  Mohammed J. Bawaneh,et al.  Arabic Text Classification using K-NN and Naive Bayes , 2008 .

[36]  Anirban Dasgupta,et al.  Feature selection methods for text classification , 2007, KDD '07.

[37]  Khaled Shaalan,et al.  Arabic Natural Language Processing: Challenges and Solutions , 2009, TALIP.

[38]  Saïd El Alaoui Ouatik,et al.  A comparative study of root-based , 2012, ArXiv.

[39]  Rasha Elhassan,et al.  Arabic Text Classification on Full Word , 2015 .

[40]  Ismail Hmeidi,et al.  Performance of KNN and SVM classifiers on full word Arabic articles , 2008, Adv. Eng. Informatics.

[41]  Amna A. Al Kaabi,et al.  Arabic Light Stemmer : Anew Enhanced Approach , 2005 .

[42]  May Y. Al-Nashashibi,et al.  An improved root extraction technique for Arabic words , 2010, 2010 2nd International Conference on Computer Technology and Development.

[43]  Ting Yu,et al.  Combine Vector Quantization and Support Vector Machine for Imbalanced Datasets , 2006, IFIP AI.

[44]  Bassam Al-Shargabi,et al.  A comparative study for Arabic text classification algorithms based on stop words elimination , 2011, ISWSA '11.

[45]  Abdelwadood Moh'd. Mesleh,et al.  Feature sub-set selection metrics for Arabic text classification , 2011, Pattern Recognit. Lett..

[46]  Marcus Randall,et al.  Feature Selection for Classification Using an Ant Colony System , 2010, 2010 Sixth IEEE International Conference on e-Science Workshops.