New rules-based algorithm to improve Arabic stemming accuracy

In the recent past, the world has been witnessing a steady increase in the area of natural language processing owing to the spread of the internet. However, attempts and efforts devoted for Arabic language are still limited. By morphological and semantic properties, Arabic is considered a difficult language in the field of automatic processing. From that perspective, many different approaches were attempted to deal with the morphological variation and the agglutination phenomenon while stemming Arabic texts. Formally, stemming and light-stemming are used to remove irrelevant morphological variations from a given word, and extract its original stem or root. This research introduces a complete new rules-based algorithm. This involves precise removal of affixes based on context-sensitive morphological rules and then deduces the root according to a predefined set of rules. Finally, results show that the accuracy of the proposed algorithm is higher than the two well-known Arabic stemmers.

[1]  Mohammed N. Al-Kabi,et al.  Towards improving Khoja rule-based Arabic stemmer , 2013, 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT).

[2]  Mohd Juzaiddin Ab Aziz,et al.  The Enhancement of Arabic Stemming by Using Light Stemming and Dictionary-Based Stemming , 2011, J. Softw. Eng. Appl..

[3]  Sameh H. Ghwanmeh,et al.  Enhanced Algorithm for Extracting the Root of Arabic Words , 2009, 2009 Sixth International Conference on Computer Graphics, Imaging and Visualization.

[4]  Jessica Lin,et al.  A novel Arabic lemmatization algorithm , 2008, AND '08.

[5]  Riyad Al-Shalabi,et al.  A Computational Morphology System for Arabic , 1998, SEMITIC@COLING.

[6]  Eric Atwell,et al.  Comparative Evaluation of Arabic Language Morphological Analysers and Stemmers , 2008, COLING.

[7]  Zainab Abu Bakar,et al.  A rule-based Arabic stemming algorithm , 2011 .

[8]  Agus Zainal Arifin,et al.  MODIFICATION OF KHOJA STEMMER FOR SEARCHING ARABIC TEXT , 2012 .

[9]  Chris D. Paice An evaluation method for stemming algorithms , 1994, SIGIR '94.

[10]  Fredric C. Gey,et al.  Building an Arabic Stemmer for Information Retrieval , 2002, TREC.

[11]  Djelloul Ziadi,et al.  Rational Kernels for Arabic Text Classification , 2013, SLSP.

[12]  Walid Cherif,et al.  Building a syntactic rules-based stemmer to improve search effectiveness for arabic language , 2014, 2014 9th International Conference on Intelligent Systems: Theories and Applications (SITA-14).

[13]  Ali Behloul,et al.  Implementation of a New Hybrid Method for Stemming of Arabic Text , 2012 .

[14]  Kazem Taghva,et al.  Arabic stemming without a root dictionary , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[15]  Lisa Garnand Dawdy-Hesterberg,et al.  Learnability and generalisation of Arabic broken plural nouns , 2014, Language, cognition and neuroscience.

[16]  Motaz Saad,et al.  OSAC: Open Source Arabic Corpora , 2010 .

[17]  Kadri Hacioglu,et al.  Automatic Processing of Modern Standard Arabic Text , 2007 .

[18]  Chris D. Paice,et al.  Another stemmer , 1990, SIGF.

[19]  Rehab Duwairi,et al.  A study of the effects of preprocessing strategies on sentiment analysis for Arabic text , 2014, J. Inf. Sci..

[20]  Fouzi Harrag,et al.  Stemming as a feature reduction technique for Arabic Text Categorization , 2011, 2011 10th International Symposium on Programming and Systems.

[21]  Lisa Ballesteros,et al.  Light Stemming for Arabic Information Retrieval , 2007 .

[22]  Raed Kareem Kanaan,et al.  AN IMPROVED ALGORITHM FOR THE EXTRACTION OF TRILITERAL ARABIC ROOTS , 2014 .

[23]  R. Duwairi,et al.  Stemming Versus Light Stemming as Feature Selection Techniques for Arabic Text Categorization , 2007, 2007 Innovations in Information Technologies (IIT).

[24]  Abdellah Madani,et al.  New stemming for arabic text classification using feature selection and decision trees , 2014 .

[25]  Zainab Abu Bakar,et al.  A Rule and Template Based Stemming A lgorithm for Arabic Language , 2011 .

[26]  Haidar M. Harmanani,et al.  A Rule-Based Extensible Stemmer for Information Retrieval with Application to Arabic , 2006, Int. Arab J. Inf. Technol..

[27]  Andreas Nürnberger,et al.  A web statistics based conflation approach to improve Arabic text retrieval , 2011, 2011 Federated Conference on Computer Science and Information Systems (FedCSIS).

[28]  Abdel Hamid Kreaa,et al.  Arabic Words Stemming Approach Using Arabic Wordnet , 2014 .

[29]  Jian-Yun Nie,et al.  Effective Stemming for Arabic Information Retrieval , 2006, BCS.

[30]  Ahmad Al-Zyoud,et al.  Arabic stemming techniques: Comparisons and new vision , 2015, 2015 IEEE 8th GCC Conference & Exhibition.

[31]  Ismail Hmeidi,et al.  Extracting the roots of Arabic words without removing affixes , 2014, J. Inf. Sci..

[32]  M. Hadni,et al.  A new and efficient stemming technique for Arabic Text Categorization , 2012, 2012 International Conference on Multimedia Computing and Systems.

[33]  Keith E. Emmert,et al.  Rule-based Approach for Arabic Root Extraction: New Rules to Directly Extract Roots of Arabic Words , 2014, J. Comput. Inf. Technol..

[34]  Ahmed Ibraheem J Shagalieh Building an Effective Stemmer for Arabic Language to Improve Search Effectiveness , 2014 .

[35]  Abdelmonaime Lachkar,et al.  Effective Arabic Stemmer Based Hybrid Approach for Arabic Text Categorization , 2013 .

[36]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[37]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[38]  R. Al Shalabi,et al.  New approach for extracting Arabic roots , 2003 .

[39]  Walid Cherif,et al.  Integrating effective rules to improve arabic text stemming , 2014, 2014 International Conference on Multimedia Computing and Systems (ICMCS).

[40]  Riyad Al-Shalabi,et al.  Building an effective rule-based light stemmer for Arabic language to inprove search effectiveness , 2008, 2008 International Conference on Innovations in Information Technology.

[41]  Massimo Poesio,et al.  Identifying Broken Plurals in Unvowelised Arabic Tex , 2004, EMNLP.

[42]  Lisa Ballesteros,et al.  Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis , 2002, SIGIR '02.