Poetry Classification Using Support Vector Machines

Problem statement: Traditional Malay poetry called pantun is a form of art to express ideas, emotions and feelings in the form of rhyming lines. Malay poetry usually has a broad and deep meaning making it difficult to be interpreted. More over, few efforts have been done on automatic classification of literary text such as poetry. Approach: This research concerns with the classification of Malay pantun using Support Vector Machines (SVM). The capability of SVM through Radial Basic Function (RBF) and linear kernel function are imple mented to classify pantun by theme, as well as poetry or non-poetry. A total of 1500 pantun are di vided into 10 themes with 214 Malaysian folklore documents used as the training and testing datasets . We used tfidf for both classification experiments and the shape feature for the classification of poe try and non-poetry experiment alone. Results: The results of each experiment showed that the linear k ernel achieved a better percentage of average accuracy compared to the RBF kernel. Conclusion: The results show the potential of SVM technique in classifying poems into various classification of which previous approaches only focused on classifying prose only.

[1]  H. Tizhoosh,et al.  Poetic Features for Poem Recognition: A Comparative Study , 2008 .

[2]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[5]  Dewan Bahasa dan Pustaka,et al.  Kumpulan pantun Melayu , 1984 .

[6]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[7]  David E. Johnson,et al.  Maximizing Text-Mining Performance , 1999 .

[8]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[9]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[10]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[11]  Gerhard Knolmayer,et al.  Document Classification Methods for Organizing Explicit Knowledge , 2002 .

[12]  Shahrul Azman Mohd Noah,et al.  Automatic classifications of malay proverbs using Naïve Bayesian Algorithm , 2008 .

[13]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[14]  Yiming Yang,et al.  A Study of Approaches to Hypertext Categorization , 2002, Journal of Intelligent Information Systems.

[15]  Masahiko Haruno,et al.  Feature Selection in SVM Text Categorization , 1999, AAAI/IAAI.

[16]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.