Scalable multi-label Arabic text classification

Multi-label text classification (MTC) is a natural extension of the traditional text classification (TC) in which a possibly large set of labels can be assigned to each document. The dimensionality of labels makes MTC difficult and challenging. Several ways are proposed to ease the classification process and one of them is called the problem transformation (PT) method. It is used to transform the multi-labeled data into a single-label one that is suitable for normal classification. Our paper presents a detailed study about using the supervised approach to address the MTC problem for Arabic text. Moreover, the scalability of such an approach is considered in our experiments. The MEKA system is used to convert the multi-label data into a single-label one using different PT methods: LC, BR and RT. Then, different classifiers commonly used for TC such as SVM, NB, KNN, and Decision Tree, are applied to each dataset. The results show that using SVM on the LC dataset generated the best results with 71% ML-accuracy.

[1]  H. Ezzat,et al.  TopicAnalyzer: A system for unsupervised multi-label Arabic topic categorization , 2012, 2012 International Conference on Innovations in Information Technology (IIT).

[2]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[3]  Motaz Saad,et al.  The Impact of Text Preprocessing and Term Weighting on Arabic Text Classification , 2010 .

[4]  Rajarathnam Chandramouli,et al.  Author gender identification from text , 2011, Digit. Investig..

[5]  Mahmoud Al-Ayyoub,et al.  An analytical study of Arabic sentiments: Maktoob case study , 2013, 8th International Conference for Internet Technology and Secured Transactions (ICITST-2013).

[6]  Mahmoud Al-Ayyoub,et al.  On authorship authentication of Arabic articles , 2014, 2014 5th International Conference on Information and Communication Systems (ICICS).

[7]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..

[8]  Mahmoud Al-Ayyoub,et al.  An extended analytical study of Arabic sentiments , 2014, Int. J. Big Data Intell..

[9]  Mahmoud Al-Ayyoub,et al.  Cross-Lingual Short-Text Document Classification for Facebook Comments , 2014, 2014 International Conference on Future Internet of Things and Cloud.

[10]  Nayer M. Wanas,et al.  A Study of Text Preprocessing Tools for Arabic Text Categorization , 2009 .

[11]  Moshe Koppel,et al.  Automatically Classifying Documents by Ideological and Organizational Affiliation , 2009, 2009 IEEE International Conference on Intelligence and Security Informatics.

[12]  Arash Joorabchi,et al.  An unsupervised approach to automatic classification of scientific literature utilizing bibliographic metadata , 2011, J. Inf. Sci..

[13]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[14]  Abdulmohsen Al-Thubaity,et al.  Automatic Arabic Text Classification , 2008 .

[15]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[16]  Zoran Bosnic,et al.  Ontology-based multi-label classification of economic articles , 2011, Comput. Sci. Inf. Syst..

[17]  Ning Wu,et al.  On Compression-Based Text Classification , 2005, ECIR.

[18]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[19]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[20]  Mohammad S. Khorsheed,et al.  Comparative evaluation of text classification techniques using a large diverse Arabic dataset , 2013, Language Resources and Evaluation.

[21]  Rehab Duwairi,et al.  A hierarchical K-NN classifier for textual data , 2011, Int. Arab J. Inf. Technol..

[22]  Mahmoud Al-Ayyoub,et al.  An extensive study of the Bag-of-Words approach for gender identification of Arabic articles , 2014, 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA).

[23]  Jesse Read,et al.  Scalable Multi-label Classification , 2010 .

[24]  Mahmoud Al-Ayyoub,et al.  Automatic Arabic text categorization: A comprehensive comparative study , 2015, J. Inf. Sci..

[25]  Chris Callison-Burch,et al.  The Arabic Online Commentary Dataset: an Annotated Dataset of Informal Arabic with High Dialectal Content , 2011, ACL.

[26]  Jaber Alwedyan,et al.  Categorize arabic data sets using multi-class classification based on association rule approach , 2011, ISWSA '11.

[27]  Chris Callison-Burch,et al.  Arabic Dialect Identification , 2014, CL.

[28]  Bashar Al Shboul,et al.  Multi-way sentiment classification of Arabic reviews , 2015, 2015 6th International Conference on Information and Communication Systems (ICICS).

[29]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[30]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[31]  Mahmoud Al-Ayyoub,et al.  Compression-based arabic text classification , 2014, 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA).