Persian Named Entity Recognition

Named Entity Recognition (NER) is an important natural language processing (NLP) tool for information extraction and retrieval from unstructured texts such as newspapers, blogs and emails. NER involves processing unstructured text for classification of words or expressions into relevant categories. In literature, NER has been developed for various languages but limited work has been conducted to develop NER for Persian text. This is due to limited resources (such as corpus, lexicons etc.) and tools for Persian named entities. In this paper, a novel scalable system for Persian Named Entity Recognition (PNER) is presented. The proposed PNER can recognize and extract three most important named entities in Persian script: the person name, location and date. The proposed PNER has been developed by combining a grammatical rule-based approach with machine learning. The proposed framework has integrated dictionaries of Persian named entities, Persian grammar rules and a Support Vector Machine (SVM). The performance evaluation of PNER in terms of precision, recall and f-measure has achieved comparable results with the state-of-the-art NER frameworks in other languages.

[1]  Daniel Sánchez-Cisneros,et al.  UEM-UC3M: An Ontology-based named entity recognition system for biomedical texts. , 2013, *SEMEVAL.

[2]  Gowri Prasad,et al.  Named entity recognition approaches: A study applied to English and Hindi language , 2015, 2015 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2015].

[3]  Georgios Paliouras,et al.  Named-Entity Recognition from Greek and English Texts , 1999, J. Intell. Robotic Syst..

[4]  Scharolta Katharina Sien Adapting word2vec to Named Entity Recognition , 2015 .

[5]  Qi He,et al.  Tweet Segmentation and Its Application to Named Entity Recognition , 2015, IEEE Transactions on Knowledge and Data Engineering.

[6]  Raphaël Troncy,et al.  Making Sense of Microposts (#Microposts2016) Named Entity rEcognition and Linking (NEEL) Challenge , 2015, #Microposts.

[7]  Hua Xu,et al.  Research and applications: A comprehensive study of named entity recognition in Chinese clinical text , 2014, J. Am. Medical Informatics Assoc..

[8]  Qiang Zhou,et al.  PerSent: A Freely Available Persian Sentiment Lexicon , 2016, BICS.

[9]  Hongyu Guo,et al.  The Unreasonable Effectiveness of Word Representations for Twitter Named Entity Recognition , 2015, NAACL.

[10]  Qiang Zhou,et al.  Multilingual Sentiment Analysis: State of the Art and Independent Comparison of Techniques , 2016, Cognitive Computation.

[11]  Khaled Shaalan,et al.  A Survey of Arabic Named Entity Recognition and Classification , 2014, CL.

[12]  Walaa Medhat,et al.  Sentiment analysis algorithms and applications: A survey , 2014 .

[13]  Raphaël Troncy,et al.  Analysis of named entity recognition and linking for tweets , 2014, Inf. Process. Manag..

[14]  Thierry Hamon,et al.  CLEF eHealth Evaluation Lab 2015 Task 1b: Clinical Named Entity Recognition , 2015, CLEF.

[15]  Mona T. Diab,et al.  Named Entity Recognition for Arabic Social Media , 2015, VS@HLT-NAACL.