论文信息 - Stemming in the language modeling framework

Stemming in the language modeling framework

Stemming is the process of collapsing words into their morphological root. For example, the terms addicted, addicting, addictions, addictive, and addicts might be conflated to their stem, addict. Over the years, numerous studies [2, 3, 4] have considered stemming as an external process — either to be ignored or used as a pre-processing step. In this study, we try and provide a fresh perspective to stemming. We are motivated by the observation that stemming can be viewed as a form of smoothing, as a way of improving statistical estimates. This suggests that stemming could be directly incorporated into a language model, which is what we achieve in this paper. Detailed discussions are available in[1].

James Allan | Giridhar Kumaran

[1] Lisa Ballesteros,et al. Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis , 2002, SIGIR '02.

[2] Donna K. Harman,et al. How effective is suffixing? , 1991, J. Am. Soc. Inf. Sci..

[3] Jinxi Xu,et al. Evaluating a probabilistic model for cross-lingual information retrieval , 2001, SIGIR '01.

[4] David A. Hull. Stemming Algorithms: A Case Study for Detailed Evaluation , 1996, J. Am. Soc. Inf. Sci..

[5] James Allan,et al. Details on Stemming in the Language Modeling Framework , 2003 .