VBS Stemmer: A vocabulary-based stemmer

Stemming is referred to a procedure of reducing all words appearing in different morphological variants to a common form. As a matter of fact, it is considered as a functional way in various areas of information-retrieval work and computational linguistics. In this paper, we introduced the Vocabulary Based Stemmer (VBS) as the alternative solution to the stemming problem for the applications which are based on the semantic relation between words or dictionary based and need valid words. The Vocabulary part of VBS stemmer is generated based on WordNet. To validate the VBS Stemmer, part of “Cranfield 1400” test collection being used, and the result shows significant improvements over the previous stemmers.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Ananthakrishnan Ramanathan,et al.  A Lightweight Stemmer for Hindi , 2003 .

[3]  Robert T. Dattola FIRST: Flexible Information Retrieval System for Text , 1979, J. Am. Soc. Inf. Sci..

[4]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[5]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[6]  Jacques Savoy,et al.  Searching strategies for the Hungarian language , 2008, Inf. Process. Manag..

[7]  Kazem Taghva,et al.  A stemming algorithm for the Farsi language , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[8]  Chris D. Paice,et al.  Another stemmer , 1990, SIGF.

[9]  Nik Rumzi Nik Idris Stemming for Term Conflation in Malay Texts. , 2001 .

[10]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[11]  Peter Willett,et al.  The Porter stemming algorithm: then and now , 2006, Program.

[12]  Nicola Ferro,et al.  The Effectiveness of a Graph-Based Algorithm for Stemming , 2002, ICADL.

[13]  Margaret Mann Information policy : a select bibliography of recent literature compiled for the British Library Research and Development Department , 1985 .

[14]  Eric Atwell,et al.  Comparative Evaluation of Arabic Language Morphological Analysers and Stemmers , 2008, COLING.

[15]  Prasenjit Majumder,et al.  YASS: Yet another suffix stripper , 2007, TOIS.

[16]  Jacques Savoy,et al.  Searching strategies for the Bulgarian language , 2007, Information Retrieval.

[17]  Douglas W. Oard,et al.  CLEF Experiments at Maryland: Statistical Stemming and Backoff Translation , 2000, CLEF.

[18]  Deepika Sharma,et al.  Stemming Algorithms: A Comparative Study and their Analysis , 2012 .

[19]  Swapan K. Parui,et al.  A novel corpus-based stemming algorithm using co-occurrence statistics , 2011, SIGIR.

[20]  Wessel Kraaij,et al.  Porter's stemming algorithm for Dutch , 1994 .

[21]  Nicola Ferro,et al.  A probabilistic model for stemmer generation , 2005, Inf. Process. Manag..

[22]  C. Huyck,et al.  A stemming algorithm for the portuguese language , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.