RENAR: A Rule-Based Arabic Named Entity Recognition System

Named entity recognition has served many natural language processing tasks such as information retrieval, machine translation, and question answering systems. Many researchers have addressed the name identification issue in a variety of languages and recently some research efforts have started to focus on named entity recognition for the Arabic language. We present a working Arabic information extraction (IE) system that is used to analyze large volumes of news texts every day to extract the named entity (NE) types person, organization, location, date, and number, as well as quotations (direct reported speech) by and about people. The named entity recognition (NER) system was not developed for Arabic, but instead a multilingual NER system was adapted to also cover Arabic. The Semitic language Arabic substantially differs from the Indo-European and Finno-Ugric languages currently covered. This article thus describes what Arabic language-specific resources had to be developed and what changes needed to be made to the rule set in order to be applicable to the Arabic language. The achieved evaluation results are generally satisfactory, but could be improved for certain entity types.

[1]  Yassine Benajiba,et al.  ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy , 2009, CICLing.

[2]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[3]  Yassine Benajiba,et al.  Arabic Named Entity Recognition using Optimized Feature Sets , 2008, EMNLP.

[4]  Steinberger Ralf,et al.  Using Language-independent Rules to Achieve High Multilinguality in Text Mining , 2008 .

[5]  Yassine Benajiba,et al.  ANERsys 2.0: Conquering the NER Task for the Arabic Language by Combining the Maximum Entropy with POS-tag Information , 2007, IICAI.

[6]  Hayssam N. Traboulsi,et al.  Named entity recognition : a local grammar-based approach , 2006 .

[7]  John Maloney,et al.  TAGARAB: A Fast, Accurate Arabic Name Recognizer Using High-Precision Morphological Analysis , 1998, SEMITIC@COLING.

[8]  Michal Konkol,et al.  Named Entity Recognition , 2012 .

[9]  N. H. Beebe A Complete Bibliography of ACM Transactions on Asian Language Information Processing , 2007 .

[10]  Khaled Shaalan Arabic GramCheck: a grammar checker for Arabic: Research Articles , 2005 .

[11]  Khaled F. Shaalan,et al.  Arabic GramCheck: a grammar checker for Arabic , 2005, Softw. Pract. Exp..

[12]  Khaled Shaalan,et al.  Arabic Named Entity Recognition from Diverse Text Types , 2008, GoTAL.

[13]  Khaled Shaalan,et al.  NERA: Named Entity Recognition for Arabic , 2009, J. Assoc. Inf. Sci. Technol..

[14]  Bruno Pouliquen,et al.  An introduction to the Europe Media Monitor family of applications , 2013, ArXiv.

[15]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[16]  Xiaoqiang Luo,et al.  The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution , 2005, SEMITIC@ACL.

[17]  Bruno Pouliquen,et al.  Multilingual person name recognition and transliteration , 2005, ArXiv.

[18]  Andreas Stolcke,et al.  Morphology-based language modeling for arabic speech recognition , 2004, INTERSPEECH.

[19]  Wajdi Zaghouani Le repérage automatique des entités nommées dans la langue arabe , 2009 .

[20]  Hayssam N. Traboulsi,et al.  Arabic named entity extraction: A local grammar-based approach , 2009, IMCSIT.

[21]  Bruno Pouliquen,et al.  Using language-independent rules to achieve high multilinguality in Text Mining , 2007, NATO ASI Mining Massive Data Sets for Security.

[22]  Saleem Abuleil,et al.  Extracting Names From Arabic Text for Question-Answering Systems , 2004, RIAO.

[23]  Fathi Debili,et al.  Voyellation automatique de l'arabe , 1998, SEMITIC@COLING.