Techniques for Improving the Labelling Process of Sentiment Analysis in the Saudi Stock Market

Sentiment analysis is utilised to assess users’ feedback and comments. Recently, researchers have shown an increased interest in this topic due to the spread and expansion of social networks. Users’ feedback and comments are written in unstructured formats, usually with informal language, which presents challenges for sentiment analysis. For the Arabic language, further challenges exist due to the complexity of the language and no sentiment lexicon is available. Therefore, labelling carried out by hand can lead to mislabelling and misclassification. Consequently, inaccurate classification creates the need to construct a relabelling process for Arabic documents to remove noise in labelling. The aim of this study is to improve the labelling process of the sentiment analysis. Two approaches were utilised. First, a neutral class was added to create a framework of reliable Twitter tweets with positive, negative, or neutral sentiments. The second approach was improving the labelling process by relabelling. In this study, the relabelling process applied to only seven random features (positive or negative): “earnings” (ارباح), “losses” (خسائر), “green colour” (باللون_الاخضر), “growing” (زياده), “distribution” (توزيع), “decrease” (انخفاض), “financial penalty” (غرامة), and “delay” (تاجيل). Of the 48 tweets documented and examined, 20 tweets were relabelled and the classification error was reduced by 1.34%.

[1]  Moshe Koppel,et al.  THE IMPORTANCE OF NEUTRAL EXAMPLES FOR LEARNING SENTIMENT , 2006, Comput. Intell..

[2]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[3]  Raddouane Chiheb,et al.  Sentiment analysis in Arabic: A review of the literature , 2017, Ain Shams Engineering Journal.

[4]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[5]  Mahmoud Al-Ayyoub,et al.  Arabic sentiment analysis: Lexicon-based and corpus-based , 2013, 2013 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT).

[6]  Widya Silfianti,et al.  Visit Patterns Analysis of Foreign Tourist in Indonesian Territory Using Frequent Pattern Growth (FP-Growth) Algorithm , 2018 .

[7]  Yiming Ma,et al.  Improving an Association Rule Based Classifier , 2000, PKDD.

[8]  Alaa M. El-Halees,et al.  Arabic Text Classification Using Maximum Entropy , 2015 .

[9]  Seiichi Ozawa,et al.  Sentiment analysis for various SNS media using Naïve Bayes classifier and its application to flaming detection , 2014, 2014 IEEE Symposium on Computational Intelligence in Big Data (CIBD).

[10]  Jeonghee Yi,et al.  Sentiment analysis: capturing favorability using natural language processing , 2003, K-CAP '03.

[11]  Bing Liu,et al.  Sentiment Analysis and Subjectivity , 2010, Handbook of Natural Language Processing.

[12]  Hongliang Yu,et al.  A study of supervised term weighting scheme for sentiment analysis , 2014, Expert Syst. Appl..

[13]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[14]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[15]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[16]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[17]  Sherif Abdou,et al.  MIKA: A tagged corpus for modern standard Arabic and colloquial sentiment analysis , 2015, 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS).

[18]  Xindong Wu,et al.  Efficient mining of both positive and negative association rules , 2004, TOIS.

[19]  S. V. K. Kumar,et al.  A Survey: On Association Rule Mining , 2013 .

[20]  Emma Haddi,et al.  Sentiment analysis: text, pre-processing, reader views and cross domains , 2015 .