A Light Weight Stemmer in Kokborok

Started from the very beginning, Stemming has been playing significant roles in several Natural Language Processing Applications such as information retrieval (IR), machine translation (MT), morph analysis and deciding the part of speech (POS). Several stemmers have been developed for a large number of languages including Indian languages; however no work has been done in Kokborok, a native language of Tripura. In this paper, we have designed a simple rule based stemmer for Kokborok using an affix stripping algorithm. The reduction of inflected words to the stem or root form is performed in the stemmer by stripping the affixes and applying boundary rules where needed. The stemming algorithm has been tested using a corpus of 32578 words and out of which 13044 were uniquely found to have an overall accuracy of 80.02% for minimum suffix stripping algorithm and 85.13% for maximum suffix stripping algorithm.