Effect of Sampling Strategies on Fine-grained Emotion Classification in Microblog Text

This study investigates the effect of diverse training samples on machine learning model performance for fine-grained emotion classification. Using four different sampling strategies (random sampling, sampling by topic and two variations of sampling by user), we found the class distribution of28 emotion categories to differ across the samples produced by each sampling strategy. However, combining different sampling strategies is complementary in generating sufficiently diverse training examples for the emotion classifiers. Based on support vector machine (SVM) and Bayesian network learning algorithms, our findings show that a classifier trained on combined data from the four sampling strategies performs better and is more generalizable than a classifier trained only on data from a single sampling strategy. Demonstrating how the diversity of the training samples affect the performance of emotion classifiers is the main contribution of this study.

[1]  Cecilia Ovesdotter Alm,et al.  Emotions from Text: Machine Learning for Text-based Emotion Prediction , 2005, HLT.

[2]  P. Ekman Universals and cultural differences in facial expressions of emotion. , 1972 .

[3]  Saif Mohammad,et al.  Using Nuances of Emotion to Identify Personality , 2013, Proceedings of the International AAAI Conference on Web and Social Media.

[4]  Saif Mohammad,et al.  #Emotional Tweets , 2012, *SEMEVAL.

[5]  Preslav Nakov,et al.  SemEval-2014 Task 9: Sentiment Analysis in Twitter , 2014, *SEMEVAL.

[6]  Stan Szpakowicz,et al.  Identifying Expressions of Emotion in Text , 2007, TSD.

[7]  Jonathon Read,et al.  Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification , 2005, ACL.

[8]  Saif Mohammad,et al.  Using Hashtags to Capture Fine Emotion Categories from Tweets , 2015, Comput. Intell..

[9]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[10]  Stuart Adam Battersby,et al.  Experimenting with Distant Supervision for Emotion Classification , 2012, EACL.

[11]  Fredrik Olsson,et al.  Usefulness of Sentiment Analysis , 2012, ECIR.

[12]  Muhammad Abdul-Mageed,et al.  EmoNet: Fine-Grained Emotion Detection with Gated Recurrent Neural Networks , 2017, ACL.

[13]  Eric Horvitz,et al.  Predicting Depression via Social Media , 2013, ICWSM.

[14]  Elizabeth D. Liddy,et al.  EmoTweet-28: A Fine-Grained Emotion Corpus for Sentiment Analysis , 2016, LREC.

[15]  Saif Mohammad,et al.  Emotion Intensities in Tweets , 2017, *SEMEVAL.

[16]  Joel D. Martin,et al.  Semantic Role Labeling of Emotions in Tweets , 2014, WASSA@ACL.

[17]  Hao Chen,et al.  Micro-blog social moods and Chinese stock market: the influence of emotional valence and arousal on Shanghai Composite Index volume , 2015, Int. J. Embed. Syst..

[18]  Henry Lieberman,et al.  A model of textual affect sensing using real-world knowledge , 2003, IUI '03.

[19]  Amit P. Sheth,et al.  Harnessing Twitter "Big Data" for Automatic Emotion Identification , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[20]  R. Bouckaert Bayesian belief networks : from construction to inference , 1995 .

[21]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[22]  Eric Horvitz,et al.  Predicting postpartum changes in emotion and behavior via social media , 2013, CHI.

[23]  Sanda M. Harabagiu,et al.  EmpaTweet: Annotating and Detecting Emotions on Twitter , 2012, LREC.

[24]  Ian D. Wood,et al.  Emoji as Emotion Tags for Tweets , 2016 .

[25]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[26]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..