Named Entity Recognition and Classification in Kannada Language

Named Entity Recognition and classification (NERC) is an essential and challenging task in (NLP). Kann ada is a highly inflectional and agglutinating language prov iding one of the richest and most challenging sets of linguistic and statistical features resulting in long and complex word forms, which is large in number. It is primarily a suffixi ng Language and inflected word starts with a root and may have several suffix es added to the right. It is also a Freeword order Language. Like other Indian languages, it is a resource poor language. Annotate d corpora, name dictionaries, good morphological an lyzers, Parts of Speech (POS) taggers etc. are not yet available in the req ui d measure and not many works are reported for t his language. The work related to NERC in Kannada is not yet reported. In recent years, automatic named entity recognition an d extraction systems have become one of the popular research areas. Building NERC for Kannada is challenging. It seeks to classi fy words which represent names in text into predefined categories like perso n name, location, organization, date, time etc. Thi s paper deals with some attempts in this direction. This work starts with e xp riments in building Semi-Automated Statistical M achine learning NLP Models based on Noun Taggers. In this paper we have de loped an algorithm based on supervised learnin g techniques that include Hidden Markov Model (HMM). Some sample resu lts are reported.

[1]  Sivaji Bandyopadhyay,et al.  Bengali Named Entity Recognition Using Support Vector Machine , 2008, IJCNLP.

[2]  Harold Borko,et al.  Encyclopedia of library and information science , 1970 .

[3]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[4]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[5]  S. Lakshmana Pandian,et al.  Hybrid, Three-stage Named Entity Recognizer for Tamil , 2008 .

[6]  Pabitra Mitra,et al.  A Hybrid Feature Set based Maximum Entropy Hindi Named Entity Recognition , 2008, IJCNLP.

[7]  Sivaji Bandyopadhyay,et al.  Bengali Named Entity Recognition Using Classifier Combination , 2009, 2009 Seventh International Conference on Advances in Pattern Recognition.

[8]  Pabitra Mitra,et al.  Named Entity Recognition in Hindi using Maximum Entropy and Transliteration , 2008, Polibits.

[9]  Ralph Grishman,et al.  The NYU System for MUC-6 or Where’s the Syntax? , 1995, MUC.

[10]  Kashif Riaz,et al.  Rule-Based Named Entity Recognition in Urdu , 2010, NEWS@ACL.

[11]  Sivaji Bandyopadhyay,et al.  Named Entity Recognition using Support Vector Machine: A Language Independent Approach , 2010 .

[12]  Georgios Paliouras,et al.  Learning Decision Trees for Named-Entity Recognition and Classification , 2000 .

[13]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[14]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[15]  Wei Li,et al.  Rapid development of Hindi named entity recognition using conditional random fields and feature induction , 2003, TALIP.