Towards Emotion Recognition in Hindi-English Code-Mixed Data: A Transformer Based Approach

In the last few years, emotion detection in social-media text has become a popular problem due to its wide ranging application in better understanding the consumers, in psychology, in aiding human interaction with computers, designing smart systems etc. Because of the availability of huge amounts of data from social-media, which is regularly used for expressing sentiments and opinions, this problem has garnered great attention. In this paper, we present a Hinglish dataset labelled for emotion detection. We highlight a deep learning based approach for detecting emotions using bilingual word embeddings derived from FastText and Word2Vec approaches in Hindi-English code mixed tweets. We experiment with various deep learning models, including CNNs, LSTMs, Bi-directional LSTMs (with and without attention), along with transformers like BERT, RoBERTa, and ALBERT. The transformer based BERT model outperforms all current state-of-the-art models giving the best performance with an accuracy of 71.43%.

[1]  Dong Nguyen,et al.  Word Level Language Identification in Online Multilingual Communication , 2013, EMNLP.

[2]  Henry Lieberman,et al.  A model of textual affect sensing using real-world knowledge , 2003, IUI '03.

[3]  Pushpak Bhattacharyya,et al.  A Fall-back Strategy for Sentiment Analysis in Hindi: a Case Study , 2010 .

[4]  Jatin Sharma,et al.  “I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook , 2014, CodeSwitch@EMNLP.

[5]  Neny Isharyanti,et al.  Code-switching and code-mixing in Internet chatting: between 'yes', 'ya', and 'si'-a case study , 2009 .

[6]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[7]  Peter Auer,et al.  Handbook of Multilingualism and Multilingual Communication , 2007 .

[8]  Chu-Ren Huang,et al.  Emotion Cause Detection with Linguistic Constructions , 2010, COLING.

[9]  Fabien Ringeval,et al.  At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech , 2016, INTERSPEECH.

[10]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[11]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[12]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[13]  P. Ekman An argument for basic emotions , 1992 .

[14]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[15]  Adi Shalev,et al.  Word Embeddings and Their Use In Sentence Classification Tasks , 2016, ArXiv.

[16]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[17]  ByoungChul Ko,et al.  A Brief Review of Facial Emotion Recognition Based on Visual Information , 2018, Sensors.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Manish Shrivastava,et al.  Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text , 2016, COLING.

[20]  Stan Szpakowicz,et al.  Identifying Expressions of Emotion in Text , 2007, TSD.

[21]  Cecilia Ovesdotter Alm,et al.  Emotions from Text: Machine Learning for Text-based Emotion Prediction , 2005, HLT.

[22]  Salma Elgayar,et al.  Emotion Detection from Text: Survey , 2017 .

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[25]  W A Ezat,et al.  Multi-class Image Classification Using Deep Learning Algorithm , 2020, Journal of Physics: Conference Series.

[26]  Hai Anh Tran,et al.  A LSTM based framework for handling multiclass imbalance in DGA botnet detection , 2018, Neurocomputing.

[27]  Vinay Singh,et al.  Corpus Creation and Emotion Prediction for Hindi-English Code-Mixed Social Media Text , 2018, NAACL.

[28]  Li Zhao,et al.  Attention-based LSTM for Aspect-level Sentiment Classification , 2016, EMNLP.