Design and Development of a Stemmer for Punjabi

Stemming is the process of removing the affixes from inflected words, without doing complete morphological analysis. A stemming Algorithm is a procedure to reduce all words with the same stem to a common form [20]. It is useful in many areas of computational linguistics and information-retrieval work. This technique is used by the various search engines to find the best solution for a problem. The algorithm is a basic building block for the stemmer. Stemmer is basically used in information retrieval system to improve the performance .The paper present a stemmer for Punjabi, which uses a brute force algorithm. We also use a suffix stripping technique in our paper. Similar techniques can be used to make stemmer for other languages such as Hindi, Bengali and Marathi. The result of stemmer is good and it can be effective in information retrieval system. This stemmer also reduces the problem of over-stemming and under-stemming.

[1]  Ananthakrishnan Ramanathan,et al.  A Lightweight Stemmer for Hindi , 2003 .

[2]  Bharath Dandala,et al.  Evaluating Stemmers and Retrieval Fusion Approaches for Hindi : UNT at FIRE 2010 , 2010 .

[3]  Ababneh M.F. Mohammad,et al.  Occurrences Algorithm for String Searching Based on Brute-force Algorithm , 2006 .

[4]  Gosse Bouma,et al.  Accurate Stemming of Dutch for Text Classification , 2001, CLIN.

[5]  Haidar M. Harmanani,et al.  A Rule-Based Extensible Stemmer for Information Retrieval with Application to Arabic , 2006, Int. Arab J. Inf. Technol..

[6]  Patrick Ruch,et al.  Evaluation of Stemming, Query Expansion and Manual Indexing Approaches for the Genomic Task , 2005, TREC.

[7]  Md. Zahurul Islam,et al.  A light weight stemmer for Bengali and its use in spelling checker , 2007 .

[8]  Tengku Mohd Tengku Sembok,et al.  Rules Frequency Order Stemmer for Malay Language , 2009 .

[9]  Marie-Claire Jenkins,et al.  Conservative stemming for search and indexing , 2005 .

[10]  David A. Hull,et al.  A Detailed Analysis of English Stemming Algorithms , 2006 .

[11]  Swapan K. Parui,et al.  A Simple Stemmer for Inflectional Languages , 2008 .

[12]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[13]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[14]  Jörg Caumanns,et al.  A fast and simple stemming algorithm for German words , 1999 .

[15]  James Mayfield,et al.  Single n-gram stemming , 2003, SIGIR.

[16]  Amna A. Al Kaabi,et al.  Arabic Light Stemmer : Anew Enhanced Approach , 2005 .