Design Consideration of Malay Text Stemmer Using Structured Approach

Word stemmer (or text stemmer) is used to remove bound morphemes from derived words so that various morphological variants are mapped into common base forms. It is usually used as one of the preprocessing tools in text classification, text mining, and information retrieval tasks. Therefore, the design of an effective text stemmer is crucial for ensuring text stemming process maps morphological variants into correct base forms. This paper investigates the design consideration of an effective text stemmer from the perspective of the Malay language. These design considerations are based on current challenges faced by previous researchers in performing text stemming against Malay texts. By adopting these considerations, an effective text stemmer is expected to address common stemming errors and also, expected to produce promising stemming accuracy.

[1]  Mangalam Sankupellay,et al.  Malay-language stemmer , 2006 .

[2]  Nurul Zawiyah Mohamad,et al.  Syllable-based Malay word stemmer , 2013, 2013 IEEE Symposium on Computers & Informatics (ISCI).

[3]  Anazida Zainal,et al.  Enhanced affixation word stemmer with stemming error reducer to solve affixation stemming errors , 2016 .

[4]  Anazida Zainal,et al.  Enhanced Rules Application Order to Stem Affixation, Reduplication and Compounding Words in Malay Texts , 2016, PKAW.

[5]  Hidetoshi Yokoo,et al.  Stemming Malay Text and Its Application in Automatic Text Categorization , 2009, IEICE Trans. Inf. Syst..

[6]  Anazida Zainal,et al.  Word stemming challenges in Malay texts: A literature review , 2016, 2016 4th International Conference on Information and Communication Technology (ICoICT).

[7]  Rayner Alfred,et al.  A Literature Review and Discussion of Malay Rule - Based Affix Elimination Algorithms , 2013, KMO.

[8]  Tengku Mohd Tengku Sembok,et al.  Rules Frequency Order Stemmer for Malay Language , 2009 .

[9]  Tengku Mohd Tengku Sembok,et al.  Experiments with a stemming algorithm for Malay words , 1996 .

[10]  Vishal Gupta,et al.  A systematic review of text stemming techniques , 2016, Artificial Intelligence Review.

[11]  Anazida Zainal,et al.  Towards Stemming Error Reduction for Malay Texts , 2019 .

[12]  Bruno S. Silvestre,et al.  Social Media? Get Serious! Understanding the Functional Building Blocks of Social Media , 2011 .

[13]  Ramli Bin Abdullah,et al.  Exhaustive Affix Stripping And A Malay Word Register To Solve Stemming Errors And Ambiguity Problem In Malay Stemmers , 2012 .

[14]  Rayner Alfred,et al.  Enhancing Malay Stemming Algorithm with Background Knowledge , 2012, PRICAI.