MORPHOLOGY ANALYSIS IN MALAY POS PREDICTION

Based on the literature, the role of morphological and syntactic relationships within the sentence can be used to determining the correct tag sequence. POS-tags facing two major problems in supervised learning algorithms which are unknown tag and ambiguous tag. Incorrect tag within the sentences normally would decrease the accuracy of the performance model (tagger). In this paper, we attempt to make an experiment to compare the theoretical information for automatic part of speech tagging specifically using morphology information. Theoretical analysis introduced by expert was discussed and the computational analysis has been done to evaluate the equality. Our preliminary experimental results show an alignment in theory and computational analysis. From the analysis, two Machine Learning algorithms which Decision Tree (J48), and Nearest neighbor (kNN) were evaluated to find the highest score based on basic; accuracy, time taken to build model, and RMS error. The highest accuracy achieved is 92.86% with Decision Tree (J48) algorithm. POS tag labelled with Noun (kn), Verb (kk), and Adjective (Adj) are mostly successful identified using morphology information.

[1]  Tunga Güngör,et al.  Part-of-Speech Tagging , 2005 .

[2]  Mohamad Shanudin Zakaria,et al.  Handwritten Cursive Jawi Character Recognition: A Survey , 2008, 2008 Fifth International Conference on Computer Graphics, Imaging and Visualisation.

[3]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[4]  Masrah Azrifah Azmi Murad,et al.  MALIM — A new computational approach of malay morphology , 2010, 2010 International Symposium on Information Technology.

[5]  Nazlia Omar,et al.  A Malay Stemmer for Jawi Characters , 2011, Australasian Conference on Artificial Intelligence.

[6]  Khairuddin Omar,et al.  Part-of-Speech for Old Malay Manuscript Corpus: A Review , 2013, M-CAIT.

[7]  Lluís Màrquez i Villodre,et al.  Part-of-speech Tagging: A Machine Learning Approach based on Decision Trees , 1999 .

[8]  Nazlia Omar,et al.  Spelling error detector rule for Jawi stemmer , 2011, 2011 International Conference on Pattern Analysis and Intelligence Robotics.

[9]  Andrew Wilson,et al.  Corpus linguistics : an introduction. , 2001 .

[10]  Nizar Habash,et al.  On Arabic Transliteration , 2007 .

[11]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[12]  Zdravko Markov,et al.  Data mining the web - uncovering patterns in web content, structure, and usage , 2007 .

[13]  Nazlia Omar,et al.  Statistical malay part-of-speech (POS) tagger using Hidden Markov approach , 2011, 2011 International Conference on Semantic Technology and Information Retrieval.

[14]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.