Unsupervised Stemmer for Arabic Tweets

Stemming is an essential processing step in a wide range of high level text processing applications such as information extraction, machine translation and sentiment analysis. It is used to reduce words to their stems. Many stemming algorithms have been developed for Modern Standard Arabic (MSA). Although Arabic tweets and MSA are closely related and share many characteristics, there are substantial differences between them in lexicon and syntax. In this paper, we introduce a light Arabic stemmer for Arabic tweets. Our results show improvements over the performance of a number of well-known stemmers for Arabic.

[1]  Chris D. Paice An evaluation method for stemming algorithms , 1994, SIGIR '94.

[2]  Douglas W. Oard,et al.  Adapting Morphology for Arabic Information Retrieval , 2007 .

[3]  A. Ayesh,et al.  A Triliteral Word Roots Extraction Using Neural Network For Arabic , 2006, 2006 International Conference on Computer Engineering and Systems.

[4]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[5]  Ali Farghaly,et al.  Roots & patterns vs. stems plus grammar-lexis specifications: on what basis should a multilingual database centred on Arabic be built? , 2003, MTSUMMIT.

[6]  Allan Ramsay,et al.  POS Tagging for Arabic Tweets , 2015, RANLP.

[7]  Vincent Ng,et al.  High-Performance, Language-Independent Morphological Segmentation , 2007, HLT-NAACL.

[8]  Sameh H. Ghwanmeh,et al.  Enhanced Algorithm for Extracting the Root of Arabic Words , 2009, 2009 Sixth International Conference on Computer Graphics, Imaging and Visualization.

[9]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[10]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[11]  G. H. Al-Gaphari,et al.  A Method to Convert Sana’ani Accent to Modern Standard Arabic , 2012 .

[12]  Timothy C. Bell,et al.  Simple Arabic Stemmer , 2014, CIKM.

[13]  Izzat Alsmadi,et al.  A novel root based Arabic stemmer , 2015, J. King Saud Univ. Comput. Inf. Sci..

[14]  al shuaibifth Sana ’ ani Dialect to Modern Standard Arabic : Rule-based Direct Machine Translation , 2011 .

[15]  Anne N. De Roeck,et al.  A Morphologically Sensitive Clustering Algorithm for Identifying Arabic Roots , 2000, ACL.