Two Algorithms for Probabilistic Stemming

This chapter describes two algorithms for probabilistic stemming. A probabilistic stemmer aims at detecting word stems by using a probabilistic or statistical model with no or very little knowledge about the language for which the stemmer has been built. While illustrating two probabilistic stemming models, a reflection and an analysis of the potentialities of this approach to stemming in the context of information retrieval are made.

[1]  John A. Goldsmith,et al.  Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[2]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[3]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[4]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[5]  Stephen F. Weiss,et al.  Word segmentation by letter successor varieties , 1974, Inf. Storage Retr..

[6]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[7]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[8]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[9]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[10]  Donna Harman,et al.  How effective is suffixing , 1991 .

[11]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[12]  Peter Willett,et al.  The effectiveness of stemming for natural‐language access to Slovene textual data , 1992 .

[13]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[14]  Nicola Orio,et al.  Design, implementation, and evaluation of a methodology for automatic stemmer generation , 2007 .

[15]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[16]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[17]  G. Stormo Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Richard Durbin , Sean R. Eddy , Anders Krogh , Graeme Mitchison , 2000 .

[18]  Nicola Ferro,et al.  A probabilistic model for stemmer generation , 2005, Inf. Process. Manag..