A Novel Approach for Detecting Arabic Persons' Names using Limited Resources

Named entity recognition is an involved task and is one that usually requires the usage of numerous resources. Recognizing Arabic entities is an even more difficult task due to the inherent ambiguity of the Arabic language. Previous approaches that have tackled the problem of Arabic named entity recognition have used Arabic parsers and taggers combined with a huge set of gazetteers and sometimes large training sets. However, the recent surge in the usage of social media, where colloquial Arabic, rather than modern standard Arabic is used, invalidates these approaches because existing parsers fail to parse colloquial Arabic at an acceptable level of precision. To address such lim- itations, this paper presents an approach for recognizing Arabic persons' names without utilizing any Arabic parsers or taggers. The approach uses only a lim- ited set of publicly available dictionaries. The followed approach integrates dic- tionaries with a statistical model based on association rules for extracting pat- terns that indicate the occurrence of persons' names. Through experimentation on a benchmark dataset, we show that the performance of the presented tech- nique is comparable to the state of the art machine learning approach.

[1]  Stéphane Bressan,et al.  Association rules mining for name entity recognition , 2003, Proceedings of the Fourth International Conference on Web Information Systems Engineering, 2003. WISE 2003..

[2]  Hayssam N. Traboulsi,et al.  Arabic named entity extraction: A local grammar-based approach , 2009, IMCSIT.

[3]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[4]  Yassine Benajiba,et al.  ANERsys 2.0: Conquering the NER Task for the Arabic Language by Combining the Maximum Entropy with POS-tag Information , 2007, IICAI.

[5]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[6]  Yassine Benajiba,et al.  ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy , 2009, CICLing.

[7]  Saleem Abuleil Hybrid system for extracting and classifying Arabic proper names , 2006 .

[8]  A. Mamat,et al.  A New Fuzzy Support Vector Machine Method for Named Entity Recognition , 2008, 2008 International Conference on Computer Science and Information Technology.

[9]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[10]  Slim Mesfar,et al.  Named Entity Recognition for Arabic Using Syntactic Grammars , 2007, NLDB.

[11]  Yassine Benajiba,et al.  Arabic Named Entity Recognition using Conditional Random Fields , 2008 .

[12]  Kareem Darwish,et al.  Simplified Feature Set for Arabic Named Entity Recognition , 2010, NEWS@ACL.

[13]  Farid Meziane,et al.  Extracting person names from Arabic newspapers , 2011, 2011 International Conference on Innovations in Information Technology.

[14]  Khaled Shaalan,et al.  Arabic Named Entity Recognition from Diverse Text Types , 2008, GoTAL.

[15]  Khaled Shaalan,et al.  Person Name Entity Recognition for Arabic , 2007, SEMITIC@ACL.

[16]  Mona T. Diab,et al.  Arabic Named Entity Recognition: An SVM-based approach , 2008 .

[17]  Ali Mamat,et al.  Named Entity Recognition Using a New Fuzzy Support Vector Machine , 2008 .

[18]  Saleem Abuleil,et al.  Extracting Names From Arabic Text for Question-Answering Systems , 2004, RIAO.

[19]  Nazlia Omar,et al.  Arabic Named Entity Recognition Using Artificial Neural Network , 2012 .

[20]  S. R. El-Beltagy,et al.  Person name extraction from Modern Standard Arabic or Colloquial text , 2012, 2012 8th International Conference on Informatics and Systems (INFOS).

[21]  Khaled Shaalan,et al.  NERA: Named Entity Recognition for Arabic , 2009, J. Assoc. Inf. Sci. Technol..

[22]  Yassine Benajiba,et al.  Arabic Named Entity Recognition: A Feature-Driven Study , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Farid Meziane,et al.  A Rule Based Persons Names Arabic Extraction System , 2009 .

[24]  Yassine Benajiba,et al.  Arabic Named Entity Recognition using Optimized Feature Sets , 2008, EMNLP.

[25]  Chantal Soulé-Dupuis,et al.  Coupling approaches, coupling media and coupling languages for information retrieval , 2004 .

[26]  Khaled Shaalan,et al.  Integrating Rule-Based System with Classification for Arabic Named Entity Recognition , 2012, CICLing.

[27]  John Maloney,et al.  TAGARAB: A Fast, Accurate Arabic Name Recognizer Using High-Precision Morphological Analysis , 1998, SEMITIC@COLING.