MarS: A rule-based stemmer for morphologically rich language Marathi

Stemming is a technique that transforms morphologically similar terms into a unique term without doing a complete morphological analysis. Stemming is used as a preprocessing step in many Natural Language Processing (NLP) applications like Information retrieval (IR), Machine Translation, Parsing, Summarization, etc. The present work explores the application of stemming to the task of information retrieval. In IR, stemming is generally used for two main purposes: decreasing index size and for increasing system performance. This paper presents a stemmer for Marathi language which uses rule-based technique. The average accuracy achieved by the proposed stemmer is 79.97% when tested on a collection of 4500 unique words from the news corpus among nine runs. Since the accuracy of the proposed stemmer is satisfactory it can be effectively useful in several NLP systems for Marathi language.