Discovering The Applicability of Classification Algorithms With Arabic Poetry

The classification algorithms were developed to label the objects with the same characteristics into similar groups. In Arabic poetry, poems belong to different eras based on the time they were written in. Changes in time resulted in changes in the style of writing Arabic poetry. Hence, this study is testing the use of different classification algorithms to discover their ability to classify poems in their correct era. This process can be automated by studying the linguistic changes that happened to the Arabic poetry in terms of the words used in the hemistichs structure (part of poem line) without considering poems rhyme. In this paper, we studied applying different classification algorithms to discover poems written in the Abbasid and Andalusian eras. Data have been collected and classifiers have been trained on an overall dataset that contains 30,058 words from 10,895 poetic hemistich collected from both eras1. The experiments obtained 70.50% accuracy rate using the support vector machine classifier when tested with a sample of random poem lines. The study showed that it is possible to distinguish, with decent accuracy, poems from different eras based on identifying discriminant features that can be used in the classification.1Dataset gathered and retrieved from www.adab.com

[1]  Charu C. Aggarwal,et al.  Data Mining: The Textbook , 2015 .

[2]  Motaz Saad,et al.  Arabic text classification using decision trees , 2010 .

[3]  Mario Chica-Olmo,et al.  An assessment of the effectiveness of a random forest classifier for land-cover classification , 2012 .

[4]  Serkan Günal,et al.  The impact of preprocessing on text classification , 2014, Inf. Process. Manag..

[5]  Mohammed J. Zaki Data Mining and Analysis: Fundamental Concepts and Algorithms , 2014 .

[6]  Win Thanda Aung,et al.  Random forest classifier for multi-category classification of web pages , 2009, 2009 IEEE Asia-Pacific Services Computing Conference (APSCC).

[7]  Sotiris B. Kotsiantis,et al.  Machine learning: a review of classification and combining techniques , 2006, Artificial Intelligence Review.

[8]  S. Sukumaran,et al.  A study on classification techniques in data mining , 2013, 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT).

[9]  Rafal Ali Sameer Modified Light Stemming Algorithm for Arabic Language , 2016 .

[10]  Khaled Shaalan,et al.  Arabic Natural Language Processing: Challenges and Solutions , 2009, TALIP.

[11]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[12]  I. Maqsood,et al.  Random Forests and Decision Trees , 2012 .

[13]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[14]  Anuj Karpatne,et al.  Introduction to Data Mining (2nd Edition) , 2018 .

[15]  Alok N. Choudhary,et al.  Twitter Trending Topic Classification , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[16]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.