Implementation of Kadazan Tagger Based on Brill's Method

We present and evaluate the implementation of Part of Speech ( POS) Tagging for the Kadazan The main purpose of this study is to develop an aut omatic POS tagging for the Kadazan language, which had never, been developed before. POS tagging can tag the Kadazan corpus automatically and can help r problem of th is language. The achieve a better and higher accuracy or at least si milar to that of the other tagging approaches such as the statistical This approach can transform the tags based on the p res number of objectives were set in order to achieve t he main purpose of this study. Firstly, to apply the lexical and contextual rules for this language. Secondly, to implement the Brill's algorithm based on the set of rules and finally to determine the eff ectiveness of the Kadazan Part of Speech by using t his approach. The tagging system had been trained using four Kadazan corpuses containing 5663 words in all. Based on the evaluation results, the tagging system had achieved around 93% accuracy. and evaluate the implementation of Part of Speech ( POS) Tagging for the Kadazan language by using the Transformation-based approach. The main purpose of this study is to develop an aut omatic POS tagging for the Kadazan language, which had never, been developed before. POS tagging can tag the Kadazan corpus automatically and can help r educe the disambiguation is language. The implementation of this approach in this study is to achieve a better and higher accuracy or at least si milar to that of the other tagging approaches such as the statistical and the original rule-based approach. This approach can transform the tags based on the p res cribed set of rules. A number of objectives were set in order to achieve t he main purpose of this study. Firstly, to apply the lexical and contextual rules for this language. Secondly, to implement the Brill's algorithm based on the set of rules and finally to determine ectiveness of the Kadazan Part of Speech by using t his approach. The tagging system had been trained using four Kadazan corpuses containing 5663 words in all. Based on the evaluation results, the tagging system had achieved ; kadazan language ; Part of Speech tagger; rule-based based . Part of Speech (POS) tagging is a system that read the text in some languages and assign POS such as noun, verb, adjective, adver b, pronoun, etc . to every word in the text (corpus). The tagging process coul d be linked with morphological process such as the formation of adjective from verb. For charm', which is tagged as a verb, could be transformed int o is suffixed to that word which would then become . In Natural Language Processing, POS tagging is important in order to show how the words could be related to each othe r and how the ordered structures of the sentence could help resolve the a mbiguity problem in different kinds of analysis levels. POS tagging had been use d in many applications such machine translation, speech recognition, informatio n retrieval, dictionary (Wordnet) and so on. Hence, the importance of POS tagging cannot be ignored