论文信息 - Development of a stemming algorithm

Development of a stemming algorithm

A stemming algorithm, a procedure to reduce all words with the same stem to a common form, is useful in many areas of computational linguistics and information-retrieval work. While the form of the algorithm varies with its application, certain linguistic problems are common to any stemming procedure. As a basis for evaluation of previous attempts to deal with these problems, this paper first discusses the theoretical and practical attributes of stemming algorithms. Then a new version of a context-sensitive, longest-match stemming algorithm for English is proposed; though developed for use in a library information transfer system, it is of general application. A major linguistic problem in stemming, variation in spelling of stems, is discussed in some detail and several feasible programmed solutions are outlined, along with sample results of one of these methods.

Julie Beth Lovins | J. B. Lovins

[1] James L. Dolby,et al. The nature of affixing in written English , 1965, Mech. Transl. Comput. Linguistics.

[2] Michael E. Lesk,et al. The SMART automatic document retrieval systems—an illustration , 1965, CACM.

[3] Lois L. Earl. Part-of-speech implications of affixes , 1966, Mech. Transl. Comput. Linguistics.

[4] C F Overhage. Plans for project intrex. , 1966, Science.

[5] James L. Dolby,et al. The nature of affixing in written English, part II , 1966, Mech. Transl. Comput. Linguistics.

[6] Lois L. Earl. Structural definition of affixes from multisyllable words , 1966, Mech. Transl. Comput. Linguistics.

[7] G. M. Dyson. Computer input and the semantic organization of scientific terms - I , 1967, Inf. Storage Retr..

[8] VALDIS LEJNIEKS. THE SYSTEM OF ENGLISH SUFFIXES , 1967 .

[9] Gerard Salton,et al. Automatic Information Organization And Retrieval , 1968 .