Word Embeddings and Deep Learning for Spanish Twitter Sentiment Analysis

Spanish is the third language most used on the internet. However, Natural Language Processing research in this language is still far below the level of other languages like English. The aim of this paper is to fill this gap in the literature and to provide a comprehensive assessment of Deep Learning applied to Spanish sentiment analysis. We focus on the polarity detection task which, in the context of Spanish Twitter messages, remains as a challenging task. To do so, we explore the combination of several Word representations (Word2Vec, Glove, Fastext) and Deep Neural Networks models. Unlike poor performance obtained by previous related work using Deep Learning for Spanish sentiment analysis, we show promising results. Our best setting combines three word embeddings representations, Convolutional Neural Networks and Recurrent Neural Networks. This setup allows us to obtain state-of-the-art results on the TASS/SEPLN 2017 Spanish Twitter benchmark dataset, in terms of accuracy and macro F1-measure.

[1]  Maite Taboada,et al.  Cross-Linguistic Sentiment Analysis: From English to Spanish , 2009, RANLP.

[2]  Julio Villena-Román,et al.  Overview of TASS 2016 , 2016, TASS@SEPLN.

[3]  Graham Neubig,et al.  Neural Machine Translation and Sequence-to-sequence Models: A Tutorial , 2017, ArXiv.

[4]  Ming Zhou,et al.  Sentiment Embeddings with Applications to Sentiment Analysis , 2016, IEEE Transactions on Knowledge and Data Engineering.

[5]  Shuai Wang,et al.  Deep learning for sentiment analysis: A survey , 2018, WIREs Data Mining Knowl. Discov..

[6]  Luis Chiruzzo,et al.  RETUYT in TASS 2017: Sentiment Analysis for Spanish Tweets using SVM and CNN , 2017, ArXiv.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[9]  Natalie S. Glance,et al.  Star Quality: Aggregating Reviews to Rank Products and Merchants , 2010, ICWSM.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[12]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[13]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[14]  Mario Andrés Paredes-Valverde,et al.  Sentiment Analysis in Spanish for Improvement of Products and Services: A Deep Learning Approach , 2017, Sci. Program..

[15]  Paloma Martínez,et al.  Exploring Convolutional Neural Networks for Sentiment Analysis of Spanish tweets , 2017, EACL.

[16]  Vivek Narayanan,et al.  Fast and Accurate Sentiment Classification Using an Enhanced Naive Bayes Model , 2013, IDEAL.

[17]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[18]  David Vilares,et al.  LyS at TASS 2015: Deep Learning Experiments for Sentiment Analysis on Spanish Tweets , 2015, TASS@SEPLN.

[19]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[20]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[21]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[22]  Ting Liu,et al.  Deep learning for sentiment analysis: successful approaches and future challenges , 2015, WIREs Data Mining Knowl. Discov..

[23]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[24]  Julio Villena-Román,et al.  TASS 2015 - The Evolution of the Spanish Opinion Mining Systems , 2016, Proces. del Leng. Natural.

[25]  Sasha Blair-Goldensohn,et al.  Building a Sentiment Summarizer for Local Service Reviews , 2008 .

[26]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.