论文信息 - An Unsupervised Method to Improve Spanish Stemmer - 字舞流文

An Unsupervised Method to Improve Spanish Stemmer

We evaluate the effectiveness of using our edit distances algorithm to improving an unsupervised language-independent stemming method. The main idea is to create morphological families through the automatic words grouping using our distance. Based on that grouping, we make a stemming process. The capacity of the edit distance algorithm in the task of words clustering and the ability of our method to generate the correct stem for Spanish was evaluated. A good result (98% precision) for the morphological families' creation and also a remarkable 99.85% of correct stemming was obtained.

Rafael Muñoz | Yoan Gutiérrez-Vázquez | Antonio Fernández Orquín | Josval Díaz | R. Muñoz | Yoan Gutiérrez Vázquez | Josval Díaz

[1] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[2] Chris D. Paice,et al. Another stemmer , 1990, SIGF.

[3] Stephen F. Weiss,et al. Word segmentation by letter successor varieties , 1974, Inf. Storage Retr..

[4] Isabelle Moulinier,et al. West Group at CLEF2000: Non-English Monolingual Retrieval , 2000, CLEF.

[5] Peter Willett,et al. The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data , 1992, J. Am. Soc. Inf. Sci..

[6] Jacques Savoy,et al. A Stemming Procedure and Stopword List for General French Corpora , 1999, J. Am. Soc. Inf. Sci..

[7] K. Bretonnel Cohen,et al. Biological, translational, and clinical language processing , 2007 .

[8] Suresh Manandhar,et al. Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming , 2001, Machine Learning.

[9] Xin Li,et al. Context sensitive stemming for web search , 2007, SIGIR.

[10] Martin Braschler,et al. Experiments with the Eurospider Retrieval System for CLEF 2000 , 2000, CLEF.

[11] Harald Hammarström. Unsupervised Learning of Morphology: Survey, Model, Algorithm and Experiments , 2007 .

[12] Julie Beth Lovins,et al. Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[13] Carol Peters,et al. Cross-Language Information Retrieval and Evaluation , 2001, Lecture Notes in Computer Science.

[14] Raymond J. Mooney,et al. Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[15] Martin Porter,et al. Snowball: A language for stemming algorithms , 2001 .

[16] W. Bruce Croft,et al. Corpus-based stemming using cooccurrence of word variants , 1998, TOIS.

[17] Peter Willett,et al. The effectiveness of stemming for natural‐language access to Slovene textual data , 1992 .

[18] Wessel Kraaij,et al. Viewing stemming as recall enhancement , 1996, SIGIR '96.

[19] Jean Paul Ballerini,et al. Experiments in multilingual information retrieval using the SPIDER system , 1996, SIGIR '96.

[20] John A. Goldsmith,et al. Unsupervised Learning of the Morphology of a Natural Language , 2001, CL.

[21] W. John Wilbur. Unsupervised Learning of the Morpho-Semantic Relationship in MEDLINE , 2007, BioNLP@ACL.

[22] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[23] James Mayfield,et al. Single n-gram stemming , 2003, SIGIR.