Hybrid Approach for Marathi Named Entity Recognition

This paper describes a named entity recognition system that combines hidden markov model, handcrafted rules, and gazetteers to recognize named entities in Marathi language. The objective of the system is to recognize twelve types of NEs from the Marathi text. Marathi is morphologically rich and inflectional language. The inflections in NEs are handled by using lemmatization. The difficulties of zero and poor probabilities caused due to the sparse data are handled using pseudo word replacement and smoothing techniques. Viterbi algorithm is used for decoding and word disambiguation. The performance of the system is improved using gazetteers and grammar rules.

[1]  Wang Wei,et al.  Named Entity Recognition Using Hybrid Machine Learning Approach , 2006, 2006 5th IEEE International Conference on Cognitive Informatics.

[2]  Fabio Rinaldi,et al.  FACILE: Description of the NE System Used for MUC-7 , 1998, MUC.

[3]  Kavi Narayana Murthy,et al.  Named Entity Recognition for Telugu , 2008, IJCNLP.

[4]  Zhipeng Luo,et al.  Conditional Random Fields , 2014 .

[5]  Zhen Liu,et al.  A Hybrid Approach for Chinese Named Entity Recognition in Music Domain , 2009, 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing.

[6]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[7]  Sudha Morwal,et al.  Named Entity Recognition Using Hidden Markov Model (HMM): An Experimental Result on Hindi, Urdu and , 2013 .

[8]  Ajay S. Patil,et al.  HMM based Named Entity Recognition for inflectional language , 2017, 2017 International Conference on Computer, Communications and Electronics (Comptelix).

[9]  Sivaji Bandyopadhyay,et al.  Bengali Named Entity Recognition Using Support Vector Machine , 2008, IJCNLP.

[10]  D. S. Kushwaha,et al.  A Comparative Study of Named Entity Recognition for Hindi Using Sequential Learning Algorithms , 2009, 2009 IEEE International Advance Computing Conference.

[11]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[12]  Sanghamitra Mohanty,et al.  A Hybrid Oriya Named Entity Recognition System: Integrating HMM with MaxEnt , 2009, 2009 Second International Conference on Emerging Trends in Engineering & Technology.

[13]  Sivaji Bandyopadhyay,et al.  A Hidden Markov Model Based Named Entity Recognition System: Bengali and Hindi as Case Studies , 2007, PReMI.

[14]  Sudha Morwal,et al.  Named Entity Recognition Using Hidden Markov Model (HMM): An Experimental Result on Hindi, Urdu and Marathi Languages , 2013 .

[15]  P. M. Yohan,et al.  Named Entity Recognition in Telugu language using Language Dependent Features and Rule based Approach , 2011 .

[16]  Vasudeva Varma,et al.  Experiments in Telugu NER: A Conditional Random Field Approach , 2008, IJCNLP.

[17]  Kashif Riaz,et al.  Rule-Based Named Entity Recognition in Urdu , 2010, NEWS@ACL.

[18]  George R. Krupka,et al.  IsoQuest Inc.: Description of the NetOwl™ Extractor System as Used for MUC-7 , 1998, MUC.

[19]  I.M. Markovic,et al.  Named entity recognition and classification using context Hidden Markov Model , 2008, 2008 9th Symposium on Neural Network Applications in Electrical Engineering.

[20]  John Thickstun,et al.  CONDITIONAL RANDOM FIELDS , 2016 .

[21]  Pushpak Bhattacharyya,et al.  Incorporating Linguistic Expertise Using ILP for Named Entity Recognition in Data Hungry Indian Languages , 2009, ILP.