Arabic Text Classification using Feature-Reduction Techniques for Detecting Violence on Social Media

With the current increase in the number of online users, there has been a concomitant increase in the amount of data shared online. Techniques for discovering knowledge from these data can provide us with valuable information when it comes to detecting different problems, including violence. Violence is one of the significant problems humanity has faced in recent years all over the world, and this is especially a problem in Arabic countries. To address this issue, this research focuses on detecting violence-related tweets to help in solving this problem. Text mining is an important technique that can be used to find and predict information from text. In this study, a text classification model is built for detecting violence in Arabic dialects on Twitter using different feature-reduction approaches. The experiment comprises bagging, K-nearest neighbors (KNN), and Bayesian boosting using different extraction features, namely, root-based stemming, light stemming, and n-grams. In addition, the study used the following feature-reduction techniques: support vector machine (SVM), Chi-squared (CHI), the Gini index, correlation, rules, information gain (IG), deviation, symmetrical uncertainty, and the IG ratio. The experiment showed that the bagging with tri-gram approach has the highest accuracy at 86.61%, and a combination of IG with SVM from reduction features registers an accuracy of 90.59%.

[1]  Ibrahim S. I. Abuhaiba,et al.  Combining Different Approaches to Improve Arabic Text Documents Classification , 2017 .

[2]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[3]  Amitava Mitra,et al.  Data Transformation for Normalization , 2009, Encyclopedia of Data Warehousing and Mining.

[4]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .

[5]  Nazlia Omar,et al.  Bayesian learning for automatic Arabic text categorization , 2013 .

[6]  G. Vinodhini,et al.  Opinion mining using principal component analysis based ensemble model for e-commerce application , 2014, CSI Transactions on ICT.

[7]  Rehab Duwairi,et al.  Arabic Text Categorization , 2007, Int. Arab J. Inf. Technol..

[8]  Harun Uguz,et al.  A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm , 2011, Knowl. Based Syst..

[9]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[10]  Shivakant Mishra,et al.  International Conference on Advances in Social Networks Analysis and Mining ( ASONAM ) Are They Our Brothers ? Analysis and Detection of Religious Hate Speech in the Arabic Twittersphere , 2018 .

[11]  Damien Nouvel,et al.  Arabic natural language processing: An overview , 2019, J. King Saud Univ. Comput. Inf. Sci..

[12]  Joseph Dichy,et al.  An Empirical Study on the Feature's Type Effect on the Automatic Classification of Arabic Documents , 2010, CICLing.

[13]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[14]  Hua Wang,et al.  Intent Classification Using Feature Sets for Domestic Violence Discourse on Social Media , 2017, 2017 4th Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE).

[15]  Mahmoud Al-Ayyoub,et al.  Cross-Lingual Short-Text Document Classification for Facebook Comments , 2014, 2014 International Conference on Future Internet of Things and Cloud.

[16]  Saïd El Alaoui Ouatik,et al.  Impact of stemming on Arabic text summarization , 2016, 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt).

[17]  Abdulmohsen Al-Thubaity,et al.  Weirdness Coefficient as a Feature Selection Method for Arabic Special Domain Text Classification , 2012, 2012 International Conference on Asian Language Processing.

[18]  W. Ashour,et al.  Arabic Morphological Tools for Text Mining , 2010 .

[19]  Fawaz S. Al-Anzi,et al.  Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing , 2017, J. King Saud Univ. Comput. Inf. Sci..

[20]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[21]  Khairullah Khan,et al.  A Review of Machine Learning Algorithms for Text-Documents Classification , 2010 .

[22]  Mona Abdullah Al-Walaie,et al.  Arabic dialects classification using text mining techniques , 2017, 2017 International Conference on Computer and Applications (ICCA).

[23]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[24]  Ah-Hwee Tan,et al.  Text Mining: The state of the art and the challenges , 2000 .

[25]  Mohammad S. Khorsheed,et al.  Comparative evaluation of text classification techniques using a large diverse Arabic dataset , 2013, Language Resources and Evaluation.

[26]  Mahmoud Al-Ayyoub,et al.  Automatic Arabic text categorization: A comprehensive comparative study , 2015, J. Inf. Sci..