A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia

Part of speech tagging has some different methods or techniques to the problem in assigning each word of a text with a part-of-speech tag. In this paper, we conducted some part-of-speech tagging techniques for Bahasa Indonesia experiments using statistical approach (Unigram, Hidden Markov Models) and Brill's tagger. In this study, we used Supervised POS Tagging approach requiring a large number of annotated training corpuses to tag properly. We used some resource annotation corpus of Bahasa. Those corpuses were implemented with POS Tagging techniques. We subsequently compared and analyzed the results. We also compared the accuracy and highlighted some advantages and disadvantages for every technique we used. Unigram showed a higher accuracy compared to HMM and Brill tagger with 88,37% on a tagged corpus.

[1]  Jacob Perkins,et al.  Python 3 text processing with NLTK 3 cookbook : over 80 practical recipes on natural language processing techniques using Python's NLTK 3.0 , 2014 .

[2]  Arry Akhmad Arman,et al.  Syntax based prosody modeling using HMM for Bahasa Indonesia , 2013, 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).

[3]  Femphy Pisceldo Probabilistic Part Of Speech Tagging for Bahasa Indonesia , 2009 .

[4]  Шаторная Анастасия,et al.  Modeling a modern POS tagger using HMM and Viterbi Algorithm , 2014 .

[5]  Ayu Purwarianti,et al.  HMM Based Part-of-Speech Tagger f or Bahasa Indonesia , 2010 .

[6]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[7]  Naushad UzZaman,et al.  Comparison of different POS Tagging Techniques (n-gram, HMM and Brill’s tagger) for Bangla , 2007 .

[8]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[9]  Jasni Mohamad Zain,et al.  A Review on the Development of Indonesian Sign Language Recognition System , 2013, J. Comput. Sci..

[10]  Eugene Charniak,et al.  Equations for Part-of-Speech Tagging , 1993, AAAI.

[11]  Fahim Muhammad Hasan,et al.  Comparison of different POS tagging techniques for some South Asian languages , 2006 .

[12]  Lluís Padró,et al.  Developing Competitive HMM PoS Taggers Using Small Training Corpora , 2004, EsTAL.