Kannpos-Kannada Parts of Speech Tagger Using Conditional Random Fields

Parts Of Speech (POS) tagging is one of the basic text processing tasks of Natural Language Processing (NLP). It is a great challenge to develop POS tagger for Indian Languages, especially Kannada due to its rich morphological and highly agglutinative nature. A Kannada POS tagger has been developed using Conditional Random Fields (CRFs), a supervised machine learning technique and it is discussed in this paper. The results presented are based on experiments conducted on a large corpus consisting of 80,000 words, where 64,000 is used for training and 16,000 is used for testing. These words are collected from Kannada Wikipedia and annotated with POS tags. The tagset from Technology Development for Indian Languages (TDIL) containing 36 tags are used to assign the POS. The n-gram CRF model gave a maximum accuracy of 92.94 %. This work is the extension of “Parts of Speech (POS) Tagger for Kannada Using Conditional Random Fields (CRFs).

[1]  Joakim Nivre,et al.  Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging , 2013, TACL.

[2]  Ben Taskar,et al.  Wiki-ly Supervised Part-of-Speech Tagging , 2012, EMNLP.

[3]  T. V. Geetha,et al.  Pattern Based Bootstrapping Technique for Tamil POS Tagging , 2014, MIKE.

[4]  Valentin I. Spitkovsky,et al.  Unsupervised Dependency Parsing without Gold Part-of-Speech Tags , 2011, EMNLP.

[5]  Serge Sharoff,et al.  Cross Language POS Taggers (and other Tools) for Indian Languages: An Experiment with Kannada using Telugu Resources , 2011 .

[6]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Nisheeth Joshi,et al.  HMM BASED POS TAGGER FOR HINDI , 2013 .

[9]  K. P. Soman,et al.  Kernel based part of speech tagger for Kannada , 2010, 2010 International Conference on Machine Learning and Cybernetics.

[10]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[11]  P RamakanthKumar.,et al.  A Maximum Entropy Approach to Kannada Part Of Speech Tagging , 2012 .

[12]  J. P. Jayan,et al.  Parts Of Speech Tagger and Chunker for Malayalam: Statistical Approach , 2011 .

[13]  Wanxiang Che,et al.  Named Entity Recognition with Bilingual Constraints , 2013, HLT-NAACL.

[14]  Srikanta Patnaik,et al.  A Novel Approach for Odia Part of Speech Tagging Using Artificial Neural Network , 2013, FICTA.

[15]  R ShambhaviB,et al.  Kannada Part-Of-Speech Tagging with Probabilistic Classifiers , 2012 .