A simple algorithm for the problem of suffix stripping

Suffix stripping is a problem of removing morphological suffixes from a word to get the stem. We present suffix stripping as an unconstrained optimization problem. Free from linguistic or morphological knowledge, a simple algorithm is being developed. Superiority of the algorithm over an established technique for English language is being demonstrated. Suffix stripping ist der Prozess des systematischen Entfernens von Suffixen um zum Stamm zu gelangen. Wir prasentieren Suffix Stripping als ein Optimierungs problem ohne Nebenbedingungen. Ein einfacher Algorithmus jenseits linguistischen oder morphologischen Wissens wird entwickelt. Damit wird der Vorrang des Algorithmus vor einer Technik der englischen Sprache demonstriert.

[1]  James Mayfield,et al.  Single n-gram stemming , 2003, SIGIR.

[2]  Prasenjit Majumder,et al.  YASS: Yet another suffix stripper , 2007, TOIS.

[3]  H. S. Dhami,et al.  Application of Natural Language Processing Tools in Stemming , 2011 .

[4]  William B. Frakes Term Conflation for Information Retrieval , 1984, SIGIR.

[5]  W. Bruce Croft,et al.  Corpus-based stemming using cooccurrence of word variants , 1998, TOIS.

[6]  Nicola Orio,et al.  Design, implementation, and evaluation of a methodology for automatic stemmer generation , 2007 .

[7]  Hugo Zaragoza,et al.  Structure of morphologically expanded queries: A genetic algorithm approach , 2010, Data Knowl. Eng..

[8]  Donna Harman,et al.  How effective is suffixing , 1991 .

[9]  Chris D. Paice An evaluation method for stemming algorithms , 1994, SIGIR '94.

[10]  Chris D. Paice,et al.  Another stemmer , 1990, SIGF.

[11]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[12]  Donna K. Harman,et al.  A failure analysis of the limitation of suffixing in an online environment , 1987, SIGIR '87.

[13]  Rubén Prieto-Díaz,et al.  DARE: Domain analysis and reuse environment , 1998, Ann. Softw. Eng..

[14]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[15]  David A. Hull,et al.  A Detailed Analysis of English Stemming Algorithms , 2006 .

[16]  Nicola Orio,et al.  A novel method for stemmer generation based on hidden markov models , 2003, CIKM '03.

[17]  Wessel Kraaij,et al.  Viewing stemming as recall enhancement , 1996, SIGIR '96.

[18]  Stephen F. Weiss,et al.  Word segmentation by letter successor varieties , 1974, Inf. Storage Retr..

[19]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .