Stemming in the language modeling framework

Stemming is the process of collapsing words into their morphological root. For example, the terms addicted, addicting, addictions, addictive, and addicts might be conflated to their stem, addict. Over the years, numerous studies [2, 3, 4] have considered stemming as an external process — either to be ignored or used as a pre-processing step. In this study, we try and provide a fresh perspective to stemming. We are motivated by the observation that stemming can be viewed as a form of smoothing, as a way of improving statistical estimates. This suggests that stemming could be directly incorporated into a language model, which is what we achieve in this paper. Detailed discussions are available in[1].