A Malay Stemmer for Jawi Characters

The Malay language may be written using either Roman or Jawi characters. Most Malay stemmers cover only Roman (Rumi ) affixes. This paper proposes a stemmer for Jawi characters using two sets of rules in Jawi: one set of rules is used to stem various forms of derived words, and another set is used to replace the use of a dictionary by producing the root word for each derivative. This stemmer has been tested using 1185 derived words consisting of prefix, circumfix, suffix, and infix. The results show that 84.89% of Jawi root words have been successfully stemmed.

[1]  Sanjay K. Dwivedi,et al.  ADVANCEMENT OF CLINICAL STEMMER , 2011 .

[2]  Tengku Mohd Tengku Sembok,et al.  Rules Frequency Order Stemmer for Malay Language , 2009 .

[3]  Cheng Soon Ong,et al.  On designing an automated Malaysian stemmer for the Malay language (poster session) , 2000, IRAL '00.

[4]  Massimo Melucci,et al.  A basis for information retrieval in context , 2008, TOIS.

[5]  Kazem Taghva,et al.  Arabic stemming without a root dictionary , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[6]  Hugh E. Williams,et al.  Stemming Indonesian: A confix-stripping approach , 2007, TALIP.

[7]  Mohamad Shanudin Zakaria,et al.  Handwritten Cursive Jawi Character Recognition: A Survey , 2008, 2008 Fifth International Conference on Computer Graphics, Imaging and Visualisation.

[8]  C. Huyck,et al.  A stemming algorithm for the portuguese language , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[9]  Nik Rumzi Nik Idris Stemming for Term Conflation in Malay Texts. , 2001 .

[10]  Nazlia Omar,et al.  Spelling error detector rule for Jawi stemmer , 2011, 2011 International Conference on Pattern Analysis and Intelligence Robotics.

[11]  Mohamad Shanudin Zakaria,et al.  Jawi-Malay transliteration , 2009, 2009 International Conference on Electrical Engineering and Informatics.

[12]  H. Abdullah,et al.  The morphology of Malay , 1972 .