Emotion analysis of Arabic articles and its impact on identifying the author's gender

The Gender Identification (GI) problem is concerned with determining the gender of the author of a given text based on its contents. The GI problem is one of the authorship profiling problems which have a wide range of applications in various fields such as marketing and security. Due to its importance, extensive research efforts have been invested in the GI problem for different languages. Unfortunately, the same cannot be said about the Arabic language despite its strategic importance and widespread. In this work, we explore the GI problem for Arabic text as a supervised learning problem. Specifically, we consider and compare two approaches for feature extraction. The first one is the Bag-Of-Words (BOW) approach while the second one is based on computing features related to sentiments and emotions. One goal of this work is to confirm the validity of the common stereotype that female authors tend to write in a more emotional way than male authors. Our results show that there is no conclusive evidence that this is true for our dataset.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Ismail Hmeidi,et al.  A Comparative Study of Automatic Text Categorization Methods Using Arabic Text , 2015 .

[3]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..

[4]  Mahmoud Al-Ayyoub,et al.  On authorship authentication of Arabic articles , 2014, 2014 5th International Conference on Information and Communication Systems (ICICS).

[5]  Benno Stein,et al.  Overview of the Author Profiling Task at PAN 2013 , 2013, CLEF.

[6]  Halim Sayoud,et al.  Authorship Attribution of Short Historical Arabic Texts Based on Lexical Features , 2013, 2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[7]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[8]  Moshe Koppel,et al.  Automatically Classifying Documents by Ideological and Organizational Affiliation , 2009, 2009 IEEE International Conference on Intelligence and Security Informatics.

[9]  Ning Wu,et al.  On Compression-Based Text Classification , 2005, ECIR.

[10]  Mahmoud Al-Ayyoub,et al.  Using Big Data Analytics for Authorship Authentication of Arabic Tweets , 2015, 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC).

[11]  Mohammad S. Khorsheed,et al.  Comparative evaluation of text classification techniques using a large diverse Arabic dataset , 2013, Language Resources and Evaluation.

[12]  Mahmoud Al-Ayyoub,et al.  Automatic Arabic text categorization: A comprehensive comparative study , 2015, J. Inf. Sci..

[13]  Mahmoud Al-Ayyoub,et al.  An extended analytical study of Arabic sentiments , 2014, Int. J. Big Data Intell..

[14]  Mahmoud Al-Ayyoub,et al.  Arabic sentiment analysis: Lexicon-based and corpus-based , 2013, 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT).

[15]  Mahmoud Al-Ayyoub,et al.  Towards Improving the Lexicon-Based Approach for Arabic Sentiment Analysis , 2014, Int. J. Inf. Technol. Web Eng..

[16]  Rajarathnam Chandramouli,et al.  Author gender identification from text , 2011, Digit. Investig..

[17]  Mahmoud Al-Ayyoub,et al.  Enhancing the determination of aspect categories and their polarities in Arabic reviews using lexicon-based approaches , 2015, 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT).

[18]  Nayer M. Wanas,et al.  A Study of Text Preprocessing Tools for Arabic Text Categorization , 2009 .

[19]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[20]  Dominique Estival,et al.  TAT: An Author Profiling Tool with Application to Arabic Emails , 2007, ALTA.

[21]  Mahmoud Al-Ayyoub,et al.  Automatic Lexicon Construction for Arabic Sentiment Analysis , 2014, 2014 International Conference on Future Internet of Things and Cloud.

[22]  Bashar Al Shboul,et al.  Multi-way sentiment classification of Arabic reviews , 2015, 2015 6th International Conference on Information and Communication Systems (ICICS).

[23]  David Corne,et al.  Authorship Attribution in Arabic using a hybrid of evolutionary search and linear discriminant analysis , 2010, 2010 UK Workshop on Computational Intelligence (UKCI).

[24]  Mahmoud Al-Ayyoub,et al.  Lexicon-based sentiment analysis of Arabic tweets , 2015, Int. J. Soc. Netw. Min..

[25]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[26]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[27]  Mahmoud Al-Ayyoub,et al.  Scalable multi-label Arabic text classification , 2015, 2015 6th International Conference on Information and Communication Systems (ICICS).

[28]  Mahmoud Al-Ayyoub,et al.  An analytical study of Arabic sentiments: Maktoob case study , 2013, 8th International Conference for Internet Technology and Secured Transactions (ICITST-2013).

[29]  Hsinchun Chen,et al.  Applying authorship analysis to extremist-group Web forum messages , 2005, IEEE Intelligent Systems.

[30]  Mahmoud Al-Ayyoub,et al.  An extensive study of the Bag-of-Words approach for gender identification of Arabic articles , 2014, 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA).

[31]  Mahmoud Al-Ayyoub,et al.  Using Aspect-Based Sentiment Analysis to Evaluate Arabic News Affect on Readers , 2015, 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC).

[32]  Motaz Saad,et al.  The Impact of Text Preprocessing and Term Weighting on Arabic Text Classification , 2010 .

[33]  Chris Callison-Burch,et al.  Arabic Dialect Identification , 2014, CL.

[34]  Claudia Leacock,et al.  Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications , 2008 .

[35]  Mahmoud Al-Ayyoub,et al.  Cross-Lingual Short-Text Document Classification for Facebook Comments , 2014, 2014 International Conference on Future Internet of Things and Cloud.

[36]  Patrick Juola,et al.  Large-Scale Experiments in Authorship Attribution , 2012 .

[37]  Efstathios Stamatatos,et al.  Author identification: Using text sampling to handle the class imbalance problem , 2008, Inf. Process. Manag..

[38]  Mahmoud Al-Ayyoub,et al.  Compression-based arabic text classification , 2014, 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA).