A Novel Hybrid Approach to Arabic Named Entity Recognition

Named Entity Recognition (NER) task is an essential preprocessing task for many Natural Language Processing (NLP) applications such as text summarization, document categorization, Information Retrieval, among others. NER systems follow either rule-based approach or machine learning approach. In this paper, we introduce a novel NER system for Arabic using a hybrid approach, which combines a rule-based approach and a machine learning approach in order to improve the performance of Arabic NER. The system is able to recognize three types of named entities, including Person, Location and Organization. Experimental results on ANERcorp dataset showed that our hybrid approach has achieved better performance than using the rule-based approach and the machine learning approach when they are processed separately. It also outperforms the state-of-the-art hybrid Arabic NER systems.

[1]  Yassine Benajiba,et al.  ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy , 2009, CICLing.

[2]  Mona T. Diab,et al.  Arabic Named Entity Recognition: An SVM-based approach , 2008 .

[3]  Khaled Shaalan,et al.  Arabic Named Entity Recognition from Diverse Text Types , 2008, GoTAL.

[4]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[5]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[6]  Günter Neumann,et al.  Arabic Computational Morphology: Knowledge-based and Empirical Methods , 2007 .

[7]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[8]  Christopher D. Manning,et al.  Advances in natural language processing , 2015, Science.

[9]  Khaled Shaalan,et al.  A Survey of Arabic Named Entity Recognition and Classification , 2014, CL.

[10]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[11]  Ibrahim A. Al-Kharashi,et al.  Arabic morphological analysis techniques: A comprehensive survey , 2004, J. Assoc. Inf. Sci. Technol..

[12]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[13]  Nizar Habash,et al.  Improving NER in Arabic Using a Morphological Tagger , 2008, LREC.

[14]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[15]  Khaled Shaalan,et al.  A Pipeline Arabic Named Entity Recognition using a Hybrid Approach , 2012, COLING.

[16]  SchwartzRichard,et al.  An Algorithm that Learns Whats in a Name , 1999 .

[17]  Yassine Benajiba,et al.  Arabic Named Entity Recognition using Optimized Feature Sets , 2008, EMNLP.

[18]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[19]  Alexander Gelbukh,et al.  Computational Linguistics and Intelligent Text Processing , 2015, Lecture Notes in Computer Science.

[20]  Walter Daelemans,et al.  A formal framework for evaluation of information extraction , 2004 .

[21]  Khaled Shaalan,et al.  NERA: Named Entity Recognition for Arabic , 2009 .

[22]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.