Impact of Balancing Techniques for Imbalanced Class Distribution on Twitter Data for Emotion Analysis

Continuously growing technology enhances creativity and simplifies humans' lives and offers the possibility to anticipate and satisfy their unmet needs. Understanding emotions is a crucial part of human behavior. Machines must deeply understand emotions to be able to predict human needs. Most tweets have sentiments of the user. It inherits the imbalanced class distribution. Most machine learning (ML) algorithms are likely to get biased towards the majority classes. The imbalanced distribution of classes gained extensive attention as it has produced many research challenges. It demands efficient approaches to handle the imbalanced data set. Strategies used for balancing the distribution of classes in the case study are handling redundant data, resampling training data, and data augmentation. Six methods related to these techniques have been examined in a case study. Upon conducting experiments on the Twitter dataset, it is seen that merging minority classes and shuffle sentence methods outperform other techniques.

[1]  Nicholas A. Valentino,et al.  Election Night’s Alright for Fighting: The Role of Emotions in Political Participation , 2011 .

[2]  Elke A. Rundensteiner,et al.  EMOTEX: Detecting Emotions in Twitter Messages , 2014 .

[3]  Yong-Soo Seol,et al.  Emotion Recognition from Text Using Knowledge-based ANN , 2008 .

[4]  Patricio Martínez-Barco,et al.  Emotion Detection from text: A Survey , 2014 .

[5]  Chu-Ren Huang,et al.  A Text-driven Rule-based System for Emotion Cause Detection , 2010, HLT-NAACL 2010.

[6]  Alastair J. Gill,et al.  Indentifying Emotional Characteristics from Short Blog Texts , 2008 .

[7]  Shourya Roy,et al.  Fine-Grained Emotion Detection in Contact Center Chat Utterances , 2017, PAKDD.

[8]  R CBalabantaray,et al.  Multi-Class Twitter Emotion Classification: A New Approach , 2012 .

[9]  Carlo Strapparava,et al.  Learning to identify emotions in text , 2008, SAC '08.

[10]  James N. Druckman,et al.  Emotion and the Framing of Risky Choice , 2008 .

[11]  Philip Bachman,et al.  Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data , 2018, ICML.

[12]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[13]  Stewart Massie,et al.  Lexicon Generation for Emotion Detection from Text , 2017, IEEE Intelligent Systems.

[14]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  François-Régis Chaumartin,et al.  UPAR7: A knowledge-based system for headline sentiment tagging , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[17]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[18]  Hamza Aldabbas,et al.  Deep Learning-Based Sentimental Analysis for Large-Scale Imbalanced Twitter Data , 2019, Future Internet.

[19]  Prashanth U. Nyer,et al.  The role of emotions in marketing , 1999 .

[20]  Yang Xiang,et al.  X-A-BiLSTM: a Deep Learning Approach for Depression Detection in Imbalanced Data , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[21]  Andrés Montoyo,et al.  Detecting Implicit Expressions of Sentiment in Text Based on Commonsense Knowledge , 2011, WASSA@ACL.

[22]  Xuren Wang,et al.  Text Emotion Classification Research Based on Improved Latent Semantic Analysis Algorithm , 2013 .

[23]  Yixian Yang,et al.  A Heterogeneous Ensemble Learning Framework for Spam Detection in Social Networks with Imbalanced Data , 2020, Applied Sciences.

[24]  Hua Xu,et al.  Text-based emotion classification using emotion cause extraction , 2014, Expert Syst. Appl..

[25]  Sosuke Kobayashi,et al.  Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations , 2018, NAACL.

[26]  Muhammad Abdul-Mageed,et al.  EmoNet: Fine-Grained Emotion Detection with Gated Recurrent Neural Networks , 2017, ACL.

[27]  Ann O'Brien,et al.  Emotive ontology: extracting fine-grained emotions from terse, informal messages , 2013 .

[28]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[29]  Amit P. Sheth,et al.  Harnessing Twitter "Big Data" for Automatic Emotion Identification , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[30]  Tomoaki Ohtsuki,et al.  A Pattern-Based Approach for Multi-Class Sentiment Analysis in Twitter , 2017, IEEE Access.

[31]  F. Gelgi,et al.  Baum-Welch Style EM Approach on Simple Bayesian Models forWeb Data Annotation , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[32]  Pim Cuijpers,et al.  Web-based depression treatment: associations of clients' word use with adherence and outcome. , 2014, Journal of affective disorders.

[33]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[34]  Nancy Ide,et al.  Distant Supervision for Emotion Classification with Discrete Binary Values , 2013, CICLing.

[35]  D. Chicco,et al.  The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation , 2020, BMC Genomics.

[36]  Iyad Rahwan,et al.  Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[37]  Christof Monz,et al.  Data Augmentation for Low-Resource Neural Machine Translation , 2017, ACL.

[38]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[39]  Mitsuru Ishizuka,et al.  EmoHeart: Conveying Emotions in Second Life Based on Affect Sensing from Text , 2010, Adv. Hum. Comput. Interact..

[40]  Roman Klinger,et al.  IMS at EmoInt-2017: Emotion Intensity Prediction with Affective Norms, Automatically Extended Resources and Deep Learning , 2017, WASSA@EMNLP.

[41]  Mitsuru Ishizuka,et al.  Recognition of Affect, Judgment, and Appreciation in Text , 2010, COLING.

[42]  Diyi Yang,et al.  That’s So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets , 2015, EMNLP.

[43]  Puneet Agrawal,et al.  Understanding Emotions in Text Using Deep Learning and Big Data , 2019, Comput. Hum. Behav..

[44]  Andrew Ortony,et al.  The Cognitive Structure of Emotions , 1988 .

[45]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[46]  Zenun Kastrati,et al.  Towards Improved Classification Accuracy on Highly Imbalanced Text Dataset Using Deep Neural Language Models , 2021, Applied Sciences.

[47]  Sanda M. Harabagiu,et al.  EmpaTweet: Annotating and Detecting Emotions on Twitter , 2012, LREC.

[48]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[49]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[50]  Mitsuru Ishizuka,et al.  Emotion Estimation and Reasoning Based on Affective Textual Interaction , 2005, ACII.

[51]  Christopher Kanan,et al.  Data Augmentation for Visual Question Answering , 2017, INLG.

[52]  Kedhar Nath Narahari,et al.  Ruuh: A Deep Learning Based Conversational Social Agent , 2018, ArXiv.

[53]  Yunqian Ma,et al.  Imbalanced Datasets: From Sampling to Classifiers , 2013 .

[54]  Stewart Massie,et al.  Lexicon based feature extraction for emotion text classification , 2017, Pattern Recognit. Lett..

[55]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .