UMUTeam at TASS 2020: Combining Linguistic Features and Machine-learning Models for Sentiment Classification

This paper describes the participation of the UMUTeam at the TASS’2020 Workshop on Sentiment Analysis, in which two tracks were proposed. The first track consists in the classification of tweets according to general sentiments of tweets written in several Spanish varieties, whereas the second task consists in a fine-grained distinction between the six basic emotions described by Ekman (2009). Our proposal is based on the usage of linguistic features alone or in combination with word-embeddings. Specifically, we test Convolutional Neural Networks and Support Vector Machines with sentence embeddings. Although our proposal did not achieve the best results, we obtained the best precision rate regarding emotion detection (Task 2) and competitive results with respect to the general sentiment classification in which tweets written in different varieties of Spanish were mixed. We consider that our proposal, despite its limitations, provides substantial benefits such as the interpretability of the results.

[1]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[2]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[3]  Miguel Ángel García Cumbreras,et al.  Overview of TASS 2020: Introducing Emotion Detection , 2020, IberLEF@SEPLN.

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  Paul Ekman,et al.  Lie Catching and Microexpressions , 2009 .

[6]  Vivek K. Singh,et al.  Toward Multimodal Cyberbullying Detection , 2017, CHI Extended Abstracts.

[7]  Rafael Valencia-García,et al.  Review of English literature on figurative language applied to social networks , 2019, Knowledge and Information Systems.

[8]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[9]  Miguel Ángel Rodríguez-García,et al.  Automatic detection of satire in Twitter: A psycholinguistic-based approach , 2017, Knowl. Based Syst..

[10]  Nathalie Aussenac-Gilles,et al.  A study on LIWC categories for opinion mining in Spanish reviews , 2014, J. Inf. Sci..

[11]  Carlo Strapparava,et al.  EmoEvent: A Multilingual Emotion Corpus based on different Events , 2020, LREC.

[12]  Helen Christensen,et al.  A Linguistic Analysis of Suicide-Related Twitter Posts , 2017, Crisis.

[13]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[14]  Rafael Valencia-García,et al.  Ontology-driven aspect-based sentiment analysis classification: An infodemiological case study regarding infectious diseases in Latin America , 2020, Future Generation Computer Systems.