Automatic indexing systems use suffix stripping algorithms to cluster various words derived from a common root under the same stem. Currently, removing affixes to either a context-free or context-sensitive operation, where the context refers to the remaining stem. In this article, we propose a suffixing algorithm which uses grammatical categories to enhance the stemming process. This approach supports the use of foreign languages. In our case, the language is French, and a morphological analysis is required for removing inflectional suffixes or morphosyntactic variants of a lemma. After this analysis, we implement a suffix stripping algorithm which uses a dictionary and the grammatical categories to remove derivational suffixes. Our approach always returns a linguistically correct lemma, but not necessarily the “right” one. Based on our tests, this solution is an attractive one, with a mean error rate of 16%. We finish by explaining why we cannot expect significantly better results with this approach.
[1]
Gerard Salton,et al.
Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer
,
1989
.
[2]
Christiane Laeufer,et al.
Le Bon Usage
,
1986
.
[3]
Julie B. Lovins.
Error evaluation for stemming algorithms as clustering algorithms
,
1971
.
[4]
Yaacov Choueka,et al.
Disambiguation by short contexts
,
1985,
Comput. Humanit..
[5]
Donna Harman,et al.
How effective is suffixing
,
1991
.
[6]
Chris D. Paice,et al.
Another stemmer
,
1990,
SIGF.
[7]
Jacques Savoy,et al.
Bayesian Inference Networks and Spreading Activation in Hypertext Systems
,
1992,
Inf. Process. Manag..
[8]
Christopher J. Fox,et al.
A stop list for general text
,
1989,
SIGF.
[9]
C. D. Paice.
Information retrieval and the computer
,
1977
.
[10]
Julie Beth Lovins,et al.
Development of a stemming algorithm
,
1968,
Mech. Transl. Comput. Linguistics.