Part-of-Speech Tagger for Marathi Language using Limited Training Corpora

Part-of-speech tagging in Marathi language is a very complex task as Marathi is highly inflectional in nature & free word order language. In this paper we have demonstrated a rulebased Part-of-Speech tagger for Marathi Language. The hand– constructed rules that are learned from corpus and some manual addition after studying the grammar of Marathi language are added and that are used for developing the tagger. Disambiguation is done by analyzing the linguistic feature of the word, its preceding word, its following word, etc. After testing the system with three data sets we got encouraging results. The accuracy of our system is of an average 78.82% after testing it on three different data sets. General Terms Natural Language Processing.

[1]  Steven Abney,et al.  Part-of-Speech Tagging and Partial Parsing , 1997 .

[2]  Akshar Bharati,et al.  Panel: Computational Linguistics in India: An Overview , 2000, ACL.

[3]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[4]  Bipul Syam Purkayastha,et al.  Part of Speech Tagging in Manipuri: A Rule based Approach , 2012 .

[5]  Alexander F. Gelbukh,et al.  Evaluation of TnT Tagger for Spanish , 2003, Proceedings of the Fourth Mexican International Conference on Computer Science, 2003. ENC 2003..

[6]  Aniket Dalal,et al.  Building Feature Rich POS Tagger for Morphologically Rich Languages: Experiences in Hindi , 2022 .

[7]  Naushad UzZaman,et al.  Comparison of Unigram, Bigram, HMM and Brill's POS tagging approaches for some South Asian languages , 2007 .

[8]  Eric Brill,et al.  Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging , 1995, VLC@ACL.

[9]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[10]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[11]  Jean-Pierre Chanod,et al.  Tagging French - comparing a statistical and a constraint-based method , 1995, EACL.

[12]  Yamina Tlili-Guiassa,et al.  Tagging by Combining Rules-based and Memory-based Learning , 2006 .

[13]  Naushad UzZaman,et al.  Comparison of different POS Tagging Techniques (n-gram, HMM and Brill’s tagger) for Bangla , 2007 .

[14]  Fred Karlsson,et al.  Constraint Grammar as a Framework for Parsing Running Text , 1990, COLING.

[15]  Pushpak Bhattacharyya,et al.  Morphological Richness Offsets Resource Demand - Experiences in Constructing a POS Tagger for Hindi , 2006, ACL.

[16]  Saeid Rahati Quchani,et al.  Persian part of speech tagger based on Hidden Markov Model , 2008 .

[17]  ABOUT IIT BOMBAY & , 2022 .

[18]  Nisheeth Joshi,et al.  Part of Speech Tagging of Marathi Text Using Trigram Method , 2013, ArXiv.

[19]  Pushpak Bhattacharyya,et al.  A Common Parts-of-Speech Tagset Framework for Indian Languages , 2008, LREC.

[20]  Helmut Schmid,et al.  Part-of-Speech Tagging With Neural Networks , 1994, COLING.

[21]  Atro Voutilainen A syntax-based part-of-speech analyser , 1995, EACL.

[22]  Atro Voutilainen,et al.  Comparing a Linguistic and a Stochastic Tagger , 1997, ACL.

[23]  Gurpreet Singh Josan,et al.  Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey , 2010 .

[24]  Ananthakrishnan Ramanathan,et al.  A Lightweight Stemmer for Hindi , 2003 .

[25]  Bipul Syam Purkayastha,et al.  Part of Speech Tagging in Manipuri with Hidden Markov Model , 2012 .