Comparison of different POS Tagging Techniques (n-gram, HMM and Brill’s tagger) for Bangla

There are different approaches to the problem of assigning each word of a text with a parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. In this paper we compare the performance of a few POS tagging techniques for Bangla language, e.g. statistical approach (n-gram, HMM) and transformation based approach (Brill’s tagger). A supervised POS tagging approach requires a large amount of annotated training corpus to tag properly. At this initial stage of POS-tagging for Bangla, we have very limited resource of annotated corpus. We tried to see which technique maximizes the performance with this limited resource. We also checked the performance for English and tried to conclude how these techniques might perform if we can manage a substantial amount of annotated corpus.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Steven J. DeRose,et al.  Grammatical Category Disambiguation by Statistical Optimization , 1988, CL.

[3]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[4]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[5]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[6]  Eric Brill,et al.  Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach , 1993, ACL.

[7]  Eric Brill,et al.  Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging , 1995, VLC@ACL.

[8]  Atro Voutilainen Part-of-Speech Tagging , 2005 .

[9]  Gertjan van Noord,et al.  Unsupervised POS-Tagging Improves Parsing Accuracy and Parsing Efficiency , 2001, IWPT.

[10]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[11]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[12]  Eric Brill,et al.  Transformation-Based Error-Driven Parsing , 1993, IWPT.

[13]  Robert F. Simmons,et al.  A Computational Approach to Grammatical Coding of English Words , 1963, JACM.

[14]  Mihai Pop Unsupervised Part-of-speech Tagging , 1996 .

[15]  Sudeshna Sarkar,et al.  A Hybrid Model for Part-of-Speech Tagging and its Application to Bengali , 2004, International Conference on Computational Intelligence.