An Approach for Extracting and Disambiguating Arabic Persons' Names Using Clustered Dictionaries and Scored Patterns

Building a system to extract Arabic named entities is a complex task due to the ambiguity and structure of Arabic text. Previous approaches that have tackled the problem of Arabic named entity recognition relied heavily on Arabic parsers and taggers combined with a huge set of gazetteers and sometimes large training sets to solve the ambiguity problem. But while these approaches are applicable to modern standard Arabic (MSA) text, they cannot handle colloquial Arabic. With the rapid increase in online social media usage by Arabic speakers, it is important to build an Arabic named entity recognition system that deals with both colloquial Arabic and MSA text. This paper introduces an approach for extracting Arabic persons’ name without utilizing any Arabic parsers or taggers. Evaluation of the presented approach shows that it achieves high precision and an acceptable level of recall on a benchmark dataset.

[1]  Khaled Shaalan,et al.  Arabic Natural Language Processing: Challenges and Solutions , 2009, TALIP.

[2]  Yassine Benajiba,et al.  Arabic Named Entity Recognition using Conditional Random Fields , 2008 .

[3]  Khaled Shaalan,et al.  Integrating Rule-Based System with Classification for Arabic Named Entity Recognition , 2012, CICLing.

[4]  Farid Meziane,et al.  A Rule Based Persons Names Arabic Extraction System , 2009 .

[5]  Mona T. Diab,et al.  Arabic Named Entity Recognition: An SVM-based approach , 2008 .

[6]  Chung-Hsien Wu,et al.  Introduction to the Special Issue on Recent Advances in Asian Language Spoken Document Retrieval , 2009, TALIP.

[7]  Günter Neumann,et al.  Arabic Computational Morphology , 2007 .

[8]  Khaled Shaalan,et al.  A Pipeline Arabic Named Entity Recognition using a Hybrid Approach , 2012, COLING.

[9]  Khaled Shaalan,et al.  NERA: Named Entity Recognition for Arabic , 2009, J. Assoc. Inf. Sci. Technol..

[10]  A. Mamat,et al.  A New Fuzzy Support Vector Machine Method for Named Entity Recognition , 2008, 2008 International Conference on Computer Science and Information Technology.

[11]  Ali Mamat,et al.  Named Entity Recognition Using a New Fuzzy Support Vector Machine , 2008 .

[12]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[13]  Hayssam N. Traboulsi,et al.  Arabic named entity extraction: A local grammar-based approach , 2009, IMCSIT.

[14]  Yassine Benajiba,et al.  Arabic Named Entity Recognition: A Feature-Driven Study , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Alexander Gelbukh,et al.  Computational Linguistics and Intelligent Text Processing , 2015, Lecture Notes in Computer Science.

[16]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17]  Yassine Benajiba,et al.  ANERsys 2.0: Conquering the NER Task for the Arabic Language by Combining the Maximum Entropy with POS-tag Information , 2007, IICAI.

[18]  Yassine Benajiba,et al.  Arabic Named Entity Recognition using Optimized Feature Sets , 2008, EMNLP.

[19]  Samhaa R. El-Beltagy,et al.  A Novel Approach for Detecting Arabic Persons' Names using Limited Resources , 2013, Res. Comput. Sci..

[20]  Yassine Benajiba,et al.  ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy , 2009, CICLing.

[21]  Lisa Ballesteros,et al.  Light Stemming for Arabic Information Retrieval , 2007 .