Word stemming challenges in Malay texts: A literature review

Word stemming algorithm is used in artificial intelligence applications to stem derived words to their respective root words. Even though there are many existing words stemming algorithms for the Malay language have been proposed in the past years, there is still a need for an effective word stemming algorithm to address word stemming challenges due to the complexity of Malay language. Thus, this paper will highlight three main word stemming challenges in the existing word stemming algorithms namely morphological rules of Malay language, word patterns in digital Malay texts and word stemming evaluation. Hence, these stemming challenges will be considered in future development of an effective word stemming algorithms for the Malay language.

[1]  Mangalam Sankupellay,et al.  Malay-language stemmer , 2006 .

[2]  Nurul Zawiyah Mohamad,et al.  Syllable-based Malay word stemmer , 2013, 2013 IEEE Symposium on Computers & Informatics (ISCI).

[3]  Deepika Sharma,et al.  Stemming Algorithms: A Comparative Study and their Analysis , 2012 .

[4]  Ramli Bin Abdullah,et al.  Exhaustive Affix Stripping And A Malay Word Register To Solve Stemming Errors And Ambiguity Problem In Malay Stemmers , 2012 .

[5]  Suleiman H. Mustafa,et al.  N-Gram-Based Techniques for Arabic Text Document Matching; Case Study: Courses Accreditation , 2012 .

[6]  Rayner Alfred,et al.  A Literature Review and Discussion of Malay Rule - Based Affix Elimination Algorithms , 2013, KMO.

[7]  R. M. Rias,et al.  M-Hadith: Retrieving Malay Haditli text in a mobile application , 2012, 2012 International Symposium on Computer Applications and Industrial Electronics (ISCAIE).

[8]  Riyad Al-Shalabi,et al.  Experiments with the Successor Variety Algorithm Using the Cutoff and Entropy Methods , 2005 .

[9]  Tengku Mohd Tengku Sembok,et al.  Rules Frequency Order Stemmer for Malay Language , 2009 .

[10]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[11]  Cheng Soon Ong,et al.  On designing an automated Malaysian stemmer for the Malay language (poster session) , 2000, IRAL '00.

[12]  Zainab Abu Bakar,et al.  Characteristics and retrieval effectiveness of n-gram string similarity matching on Malay documents , 2011 .

[13]  Hidetoshi Yokoo,et al.  Stemming Malay Text and Its Application in Automatic Text Categorization , 2009, IEICE Trans. Inf. Syst..

[14]  Nik Rumzi Nik Idris Stemming for Term Conflation in Malay Texts. , 2001 .

[15]  James Mayfield,et al.  Addressing morphological variation in alphabetic languages , 2009, SIGIR.

[16]  Syed Abdullah Fadzli SIMPLE RULES MALAY STEMMER , 2012 .

[17]  Rayner Alfred,et al.  Enhancing Malay Stemming Algorithm with Background Knowledge , 2012, PRICAI.

[18]  Mohammed Yusoff,et al.  Experiments with a Stemming Algorithm for Malay Words , 1996, J. Am. Soc. Inf. Sci..

[19]  Zainab Abu Bakar,et al.  Using Topic Analysis for Querying Halal Information on Malay Documents , 2014 .

[20]  Roziati Zainuddin,et al.  Semantic method for query translation , 2013, Int. Arab J. Inf. Technol..

[21]  Masrah Azrifah Azmi Murad,et al.  MALIM — A new computational approach of malay morphology , 2010, 2010 International Symposium on Information Technology.

[22]  Anazida Zainal,et al.  Enhanced Rules Application Order Approach to Stem Reduplication Words in Malay Texts , 2014, SCDM.

[23]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[24]  Zainab Abu Bakar,et al.  Evaluating the Effectiveness of Thesaurus and Stemming Methods in Retrieving Malay Translated Al-Quran Documents , 2003, ICADL.