ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITION

Information Extraction (IE) is a sub discipline of Artificial Intelligence. IE identifies information in unstructured information source that adheres to predefined semantics i.e. people, location etc. Recognition of named entities (NEs) from computer readable natural language text is significant task of IE and natural language processing (NLP). Named entity (NE) extraction is important step for processing unstructured content. Unstructured data is computationally opaque. Computers require computationally transparent data for processing. IE adds meaning to raw data so that it can be easily processed by computers. There are various different approaches that are applied for extraction of entities from text. This paper elaborates need of NE recognition for Marathi and discusses issues and challenges involved in NE recognition tasks for Marathi language. It also explores various methods and techniques that are useful for creation of learning resources and lexicons that are important for extraction of NEs from natural language unstructured text.

[1]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[2]  Tony McEnery,et al.  Character Encoding in Corpus Construction , 2005 .

[3]  Harshali B. Patil,et al.  Part-of-Speech Tagger for Marathi Language using Limited Training Corpora , 2014 .

[4]  Khaled Shaalan,et al.  A Survey of Arabic Named Entity Recognition and Classification , 2014, CL.

[5]  I.M. Markovic,et al.  Named entity recognition and classification using context Hidden Markov Model , 2008, 2008 9th Symposium on Neural Network Applications in Electrical Engineering.

[6]  Georgios Paliouras,et al.  Learning Decision Trees for Named-Entity Recognition and Classification , 2000 .

[7]  Rohini K. Srihari,et al.  An Information-Extraction System for Urdu---A Resource-Poor Language , 2010, TALIP.

[8]  Gerhard Weikum,et al.  Dictionary-based Named Entity Recognition , 2013 .

[9]  Luis Gravano,et al.  Querying text databases for efficient information extraction , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  Hwee Tou Ng,et al.  Named Entity Recognition: A Maximum Entropy Approach Using Global Information , 2002, COLING.

[11]  Khaled Shaalan,et al.  Person Name Entity Recognition for Arabic , 2007, SEMITIC@ACL.

[12]  James Allan,et al.  Using Soundex Codes for Indexing Names in ASR Documents , 2004, HLT-NAACL 2004.

[13]  Zornitsa Kozareva Bootstrapping Named Entity Recognition with Automatically Generated Gazetteer Lists , 2006, EACL.

[14]  Jugal Kalita,et al.  Named Entity Recognition: A Survey for the Indian Languages , 2011 .

[15]  Steven Abney,et al.  Semisupervised Learning for Computational Linguistics , 2007 .

[16]  Hwee Tou Ng,et al.  Named Entity Recognition with a Maximum Entropy Approach , 2003, CoNLL.

[17]  Zornitsa Kozareva,et al.  Cluster Analysis and Classification of Named Entities , 2004, LREC.

[18]  William W. Cohen,et al.  Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods , 2004, KDD.

[19]  Martin Kay,et al.  Morphological Analysis , 1973, COLING.

[20]  Xiaojie Wang,et al.  Automatic Recognition of Chinese Organization Name Based on Conditional Random Fields , 2007, 2007 International Conference on Natural Language Processing and Knowledge Engineering.

[21]  P. M. Yohan,et al.  A Survey on Named Entity Recognition in Indian Languages with particular reference to Telugu , 2011 .

[22]  Sivaji Bandyopadhyay,et al.  Bengali Named Entity Recognition Using Support Vector Machine , 2008, IJCNLP.

[23]  Xavier Carreras,et al.  A Simple Named Entity Extractor using AdaBoost , 2003, CoNLL.

[24]  Justin Zobel,et al.  Phonetic string matching: lessons from information retrieval , 1996, SIGIR '96.