Developing Multi-Labelled Corpus of Twitter Short Texts: A Semi-Automatic Method

Facing fast-increasing electronic documents in the Digital Media Age, the need to extract textual features of online texts for better communication is growing. Sentiment classification might be the key method to catch emotions of online communication, and developing corpora with annotation of emotions is the first step to achieving sentiment classification. However, the labour-intensive and costly manual annotation has resulted in the lack of corpora for emotional words. Furthermore, single-label semantic corpora could hardly meet the requirement of modern analysis of complicated user’s emotions, but tagging emotional words with multiple labels is even more difficult than usual. Improvement of the methods of automatic emotion tagging with multiple emotion labels to construct new semantic corpora is urgently needed. Taking Twitter short texts as the case, this study proposes a new semi-automatic method to annotate Internet short texts with multiple labels and form a multi-labelled corpus for further algorithm training. Each sentence is tagged with both the emotional tendency and polarity, and each tweet, which generally contains several sentences, is tagged with the first two major emotional tendencies. The semi-automatic multi-labelled annotation is achieved through the process of selecting the base corpus and emotional tags, data preprocessing, automatic annotation through word matching and weight calculation, and manual correction in case of multiple emotional tendencies are found. The experiments on the Sentiment140 published Twitter corpus demonstrate the effectiveness of the proposed approach and show consistency between the results of semi-automatic annotation and manual annotation. By applying this method, this study summarises the annotation specification and constructs a multi-labelled emotion corpus with 6500 tweets for further algorithm training.

[1]  Samuel W. K. Chan Multilabel Emotion Tagging for Domain-Specific Texts , 2022, IEEE Transactions on Computational Social Systems.

[2]  Xiaodong Feng,et al.  Understanding how the semantic features of contents influence the diffusion of government microblogs: Moderating role of content topics , 2021, Inf. Manag..

[3]  Jianhuan Su,et al.  Research on Sentiment Analysis of Network Forum Based on BP Neural Network , 2020, Mobile Networks and Applications.

[4]  Alper Uysal,et al.  A novel term weighting scheme for text classification: TF-MONO , 2020, J. Informetrics.

[5]  Arthur A. Raney,et al.  Developing and validating the self-transcendent emotion dictionary for text analysis , 2020, PloS one.

[6]  Li Liang,et al.  Using normal dictionaries to extract multiple semantic relationships , 2020, The Journal of Engineering.

[7]  Donghong Ji,et al.  Topic-Enhanced Capsule Network for Multi-Label Emotion Classification , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Tatjana Scheffler,et al.  A corpus-based analysis of meaning variations in German tag questions Evidence from spoken and written conversational corpora , 2020, Corpus Linguistics and Linguistic Theory.

[9]  Imane Guellil,et al.  Arabic sentiment analysis: studies, resources, and tools , 2019, Social Network Analysis and Mining.

[10]  Shanshan Wang,et al.  Recognizing emotions in chinese text using dictionary and ensemble of classifiers , 2018, International Workshop on Pattern Recognition.

[11]  Zhenqi Li,et al.  A Review of Emotion Recognition Using Physiological Signals , 2018, Sensors.

[12]  Seth Flaxman,et al.  Multimodal Sentiment Analysis To Explore the Structure of Emotions , 2018, KDD.

[13]  Jun Li,et al.  Multi-label maximum entropy model for social emotion classification over short text , 2016, Neurocomputing.

[14]  Jiun-Hung Chen,et al.  A multi-label classification based approach for sentiment classification , 2015, Expert Syst. Appl..

[15]  Kenji Araki,et al.  Automatically Annotating A Five-Billion-Word Corpus of Japanese Blogs for Affect and Sentiment Analysis , 2012, WASSA@ACL.

[16]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[17]  P. Ekman An argument for basic emotions , 1992 .

[18]  M. Awais,et al.  Deep Learning and Machine Learning-Based Model for Conversational Sentiment Classification , 2022, Computers, Materials & Continua.

[19]  Alec Go,et al.  Twitter Sentiment Classification using Distant Supervision , 2009 .