Improving Arabic morphological analyzers benchmark

Abstract The various tools dedicated to Arabic natural language processing have undergone significant development during recent years. Among these tools, Arabic morphological analyzers are of great importance because they are often used within other projects that are more advanced such as syntactic parsers, search engines, machine translation systems, etc. Thus, researchers are forced to make a decision concerning which morphological analyzer to use in their research projects, and this task is very difficult since there are many criteria to take into account. In order to facilitate this choice, we considered the problem of benchmarking morphological analyzers in a previous work by proposing a solution that allows returning a set of metrics of each analyzer that are: accuracy, precision, recall, F-measure and the execution time. In this article, we present two new major improvements to our solution: the establishment of the first version of our corpus that is dedicated to the evaluation of morphological analyzers, as well as the introduction of a new metric, which combines all metrics related to results as well as the execution time of the analyzers.

[1]  Amir F. Atiya,et al.  Arabic Spelling Correction using Supervised Learning , 2014, ANLP@EMNLP.

[2]  Nizar Habash,et al.  Morphological Annotation of Quranic Arabic , 2010, LREC.

[3]  A. BOUDLAL,et al.  A Morphosyntactic analysis system for Arabic texts , 2010 .

[4]  S. Alansary,et al.  Building an International Corpus of Arabic ( ICA ) : Progress of Compilation Stage , 2007 .

[5]  Eric Atwell,et al.  Comparative Evaluation of Arabic Language Morphological Analysers and Stemmers , 2008, COLING.

[6]  Qasem A. Al-Radaideh,et al.  Benchmarking and assessing the performance of Arabic stemmers , 2011, J. Inf. Sci..

[7]  Mamoun Hattab,et al.  Addaall Arabic Search Engine: Improving Search based on Combination of Morphological Analysis and Generation Considering Semantic Patterns , 2009 .

[8]  Sophia Ananiadou,et al.  U-Compare: An Integrated Language Resource Evaluation Platform Including a Comprehensive UIMA Resource Library , 2010, LREC.

[9]  Ibrahim A. Al-Kharashi,et al.  Arabic morphological analysis techniques: A comprehensive survey , 2004, J. Assoc. Inf. Sci. Technol..

[10]  Younes Jaafar,et al.  Benchmark of Arabic morphological analyzers challenges and solutions , 2014, 2014 9th International Conference on Intelligent Systems: Theories and Applications (SITA-14).

[11]  Otakar Smrz,et al.  ElixirFM – Implementation of Functional Arabic Morphology , 2007, SEMITIC@ACL.

[12]  Mona T. Diab,et al.  Second Generation AMIRA Tools for Arabic Processing : Fast and Robust Tokenization , POS tagging , and Base Phrase Chunking , 2009 .

[13]  Rim Koulali,et al.  EXPERIMENTS WITH ARABIC TOPIC DETECTION , 2013 .

[14]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[15]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[16]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[17]  Kareem Darwish,et al.  Building a Shallow Arabic Morphological Analyser in One Day , 2002, SEMITIC@ACL.