A Hybrid Statistical Approach for Named Entity Recognition for Malayalam Language

Named-Entity Recognition (NER) plays a significant role in classifying or locating atomic elements in text into predefined categories such as the name of persons, organizations, locations, expression of times, quantities, monetary values, temporal expressions and percentages. Several Statistical methods with supervised and unsupervised learning have applied English and some other Indian languages successfully. Malayalam has a distinct feature in nouns having no subject-verb agreement, which is of free order, makes the NER identification a complex process. In this paper, a hybrid approach combining rule based machine learning with statistical approach is proposed and implemented, which shows 73.42% accuracy.

[1]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[2]  Rohini K. Srihari,et al.  A Hybrid Approach for Named Entity and Sub-Type Tagging , 2000, ANLP.

[3]  Ralph Grishman,et al.  Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.

[4]  Pabitra Mitra,et al.  A Hybrid Approach for Named Entity Recognition in Indian Languages , 2008 .

[5]  Frédéric Béchet,et al.  Tagging Unknown Proper Names Using Decision Trees , 2000, ACL.

[6]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[7]  Stéphane Bressan,et al.  Association rules mining for name entity recognition , 2003, Proceedings of the Fourth International Conference on Web Information Systems Engineering, 2003. WISE 2003..

[8]  Marc Moens,et al.  Description of the LTG System Used for MUC-7 , 1998, MUC.

[9]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[10]  Yassine Benajiba,et al.  Arabic Named Entity Recognition: A Feature-Driven Study , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Key-Sun Choi,et al.  Unsupervised Named Entity Classification Models and their Ensembles , 2002, COLING.

[12]  Pabitra Mitra,et al.  A Hybrid Named Entity Recognition System for South and South East Asian Languages , 2008, IJCNLP.

[13]  Sobha Lalitha Devi,et al.  Domain Focused Named Entity Recognizer for Tamil Using Conditional Random Fields , 2008, IJCNLP.

[14]  Yue-Shi Lee,et al.  Extracting Named Entities Using Support Vector Machines , 2006, KDLL.