SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets

In this paper, we present the results of the SemEval-2020 Task 9 on Sentiment Analysis of Code-Mixed Tweets (SentiMix 2020). We also release and describe our Hinglish (Hindi-English) and Spanglish (Spanish-English) corpora annotated with word-level language identification and sentence-level sentiment labels. These corpora are comprised of 20K and 19K examples, respectively. The sentiment labels are - Positive, Negative, and Neutral. SentiMix attracted 89 submissions in total including 61 teams that participated in the Hinglish contest and 28 submitted systems to the Spanglish competition. The best performance achieved was 75.0% F1 score for Hinglish and 80.6% F1 for Spanglish. We observe that BERT-like models and ensemble methods are the most common and successful approaches among the participants.

[1]  Somnath Banerjee,et al.  Overview of FIRE-2015 Shared Task on Mixed Script Information Retrieval , 2015, FIRE Workshops.

[2]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[3]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[4]  Julia Hirschberg,et al.  Overview for the First Shared Task on Language Identification in Code-Switched Data , 2014, CodeSwitch@EMNLP.

[5]  Athena Vakali,et al.  Sentiment analysis leveraging emotions and word embeddings , 2017 .

[6]  Xuejie Zhang,et al.  HPCC-YNU at SemEval-2020 Task 9: A Bilingual Vector Gating Mechanism for Sentiment Analysis of Code-Mixed Text , 2020, SemEval@COLING.

[7]  Amitava Das,et al.  Comparing the Level of Code-Switching in Corpora , 2016, LREC.

[8]  Thoudam Doren Singh,et al.  NITS-Hinglish-SentiMix at SemEval-2020 Task 9: Sentiment Analysis For Code-Mixed Social Media Text , 2020, SemEval@COLING.

[9]  Ahmet Üstün,et al.  FiSSA at SemEval-2020 Task 9: Fine-tuned For Feelings , 2020, SemEval@COLING.

[10]  Jacob Eisenstein,et al.  Phonological Factors in Social Media Writing , 2013 .

[11]  Braja Gopal Patra,et al.  Sentiment Analysis of Code-Mixed Indian Languages: An Overview of SAIL_Code-Mixed Shared Task @ICON-2017 , 2018, ArXiv.

[12]  Julia Hirschberg,et al.  Named Entity Recognition on Code-Switched Data: Overview of the CALCS 2018 Shared Task , 2018, CodeSwitch@ACL.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Jie Hao,et al.  XLP at SemEval-2020 Task 9: Cross-lingual Models with Focal Loss for Sentiment Analysis of Code-Mixing Language , 2020, SemEval@COLING.

[15]  Radhika Mamidi,et al.  gundapusunil at SemEval-2020 Task 9: Syntactic Semantic LSTM Architecture for SENTIment Analysis of Code-MIXed Data , 2020, SemEval@COLING.

[16]  Amitava Das,et al.  Identifying Languages at the Word Level in Code-Mixed Indian Social Media Text , 2014, ICON.

[17]  Yang Liu,et al.  Learning to Predict Code-Switching Points , 2008, EMNLP.

[18]  Suraj Maharjan,et al.  C1 at SemEval-2020 Task 9: SentiMix: Sentiment Analysis for Code-Mixed Social Media Text using Feature Engineering , 2020, SemEval@COLING.

[19]  Thamar Solorio,et al.  Overview for the Second Shared Task on Language Identification in Code-Switched Data , 2014, CodeSwitch@EMNLP.

[20]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[21]  Els Lefever,et al.  LT3 at SemEval-2020 Task 9: Cross-lingual Embeddings for Sentiment Analysis of Hinglish Social Media Text , 2020, SEMEVAL.

[22]  Prasenjit Majumder,et al.  Overview of the FIRE 2013 Track on Transliterated Search , 2013, FIRE.

[23]  Dipti Misra Sharma,et al.  Shallow Parsing Pipeline - Hindi-English Code-Mixed Social Media Text , 2016, NAACL.

[24]  Monojit Choudhury,et al.  Word-level Language Identification using CRF: Code-switching Shared Task Report of MSR India System , 2014, CodeSwitch@EMNLP.

[25]  Somnath Banerjee,et al.  Overview of the Mixed Script Information Retrieval (MSIR) at FIRE-2016 , 2016, FIRE.

[26]  Elizabeth Bear,et al.  TueMix at SemEval-2020 Task 9: Logistic Regression with Linguistic Feature Set , 2020, SemEval@COLING.

[27]  Arup Baruah,et al.  IIITG-ADBU at SemEval-2020 Task 9: SVM for Sentiment Analysis of English-Hindi Code-Mixed Text , 2020, SEMEVAL.

[28]  Anirudh Srinivasan MSR India at SemEval-2020 Task 9: Multilingual Models Can Do Code-Mixing Too , 2020, SemEval@COLING.

[29]  Behrouz Minaei-Bidgoli,et al.  IUST at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text using Deep Neural Networks and Linear Baselines , 2020, SemEval@COLING.

[30]  John P. McCrae,et al.  ULD@NUIG at SemEval-2020 Task 9: Generative Morphemes with an Attention Model for Sentiment Analysis in Code-Mixed Text , 2020, SemEval@COLING.

[31]  Abhishek Singh,et al.  Voice@SRIB at SemEval-2020 Tasks 9 and 12: Stacked Ensemblingmethod for Sentiment and Offensiveness detection in Social Media , 2020, SemEval@COLING.

[32]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[33]  Andrew M. Dai,et al.  Virtual Adversarial Training for Semi-Supervised Text Classification , 2016, ArXiv.

[34]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[35]  Alexander Gelbukh,et al.  NLP-CIC at SemEval-2020 Task 9: Analysing Sentiment in Code-switching Language Using a Simple Deep-learning Classifier , 2020, SEMEVAL.

[36]  Peng Wang,et al.  MeisterMorxrc at SemEval-2020 Task 9: Fine-Tune Bert and Multitask Learning for Sentiment Analysis of Code-Mixed Tweets , 2020, SEMEVAL.

[37]  E. Bokamba Code-mixing, language variation, and linguistic theory:: Evidence from Bantu languages☆ , 1988 .

[38]  ThelwallMike,et al.  Sentiment strength detection in short informal text , 2010 .

[39]  Costin-Gabriel Chiru,et al.  UPB at SemEval-2020 Task 9: Identifying Sentiment in Code-Mixed Social Media Texts Using Transformers and Multi-Task Learning , 2020, SEMEVAL.

[40]  Mark Hopkins,et al.  Reed at SemEval-2020 Task 9: Fine-Tuning and Bag-of-Words Approaches to Code-Mixed Sentiment Analysis , 2020, SemEval@COLING.

[41]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[42]  Miguel A. Alonso,et al.  Sentiment Analysis on Monolingual, Multilingual and Code-Switching Twitter Corpora , 2015, WASSA@EMNLP.

[43]  Rishiraj Saha Roy,et al.  Overview and Datasets of FIRE 2013 Track on Transliterated Search , 2013 .

[44]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[45]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[46]  Timothy Baldwin,et al.  Shared Tasks of the 2015 Workshop on Noisy User-generated Text: Twitter Lexical Normalization and Named Entity Recognition , 2015, NUT@IJCNLP.

[47]  Meghana Bhange,et al.  HinglishNLP at SemEval-2020 Task 9: Fine-tuned Language Models for Hinglish Sentiment Detection , 2020, SEMEVAL.

[48]  Anderson da Silva Soares,et al.  Deep Learning Brasil - NLP at SemEval-2020 Task 9: Sentiment Analysis of Code-Mixed Tweets Using Ensemble of Language Models , 2020, SemEval@COLING.

[49]  Gokul Chittaranjan,et al.  Overview of FIRE 2014 Track on Transliterated Search , 2014 .

[50]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[51]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[52]  Florimond Guéniat,et al.  CS-Embed at SemEval-2020 Task 9: The Effectiveness of Code-switched Word Embeddings for Sentiment Analysis , 2020, SemEval@COLING.

[53]  Alan W. Black,et al.  A Survey of Code-switched Speech and Language Processing , 2019, ArXiv.

[54]  Hongling Li,et al.  Zyy1510 Team at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text with Sub-word Level Representations , 2020, SEMEVAL.

[55]  Somnath Banerjee,et al.  LIMSI_UPV at SemEval-2020 Task 9: Recurrent Convolutional Neural Network for Code-mixed Sentiment Analysis , 2020, SemEval@COLING.

[56]  Rakesh Chandra Balabantaray,et al.  Text normalization of code mix and sentiment analysis , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[57]  José Eduardo Ochoa Luna,et al.  Palomino-Ochoa at SemEval-2020 Task 9: Robust System based on Transformer for Code-Mixed Sentiment Classification , 2020, SemEval@COLING.

[58]  Aditya Malte,et al.  Team_Swift at SemEval-2020 Task 9: Tiny Data Specialists through Domain-Specific Pre-training on Code-Mixed Data , 2020, SEMEVAL.

[59]  Peter Totterdell,et al.  Eliciting mixed emotions: a meta-analysis comparing models, types, and measures , 2015, Front. Psychol..

[60]  Yu Sun,et al.  Kk2018 at SemEval-2020 Task 9: Adversarial Training for Code-Mixing Sentiment Classification , 2020, SEMEVAL.

[61]  Riyaz Ahmad Bhat,et al.  IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search , 2014, FIRE.

[62]  Daniela Gîfu,et al.  FII-UAIC at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text Using CNN , 2020, SemEval@COLING.

[63]  Ahmed Sultan,et al.  WESSA at SemEval-2020 Task 9: Code-Mixed Sentiment Analysis using Transformers , 2020, SemEval@COLING.

[64]  Harsh Agarwal,et al.  BAKSA at SemEval-2020 Task 9: Bolstering CNN with Self-Attention for Sentiment Analysis of Code Mixed Text , 2020, SEMEVAL.

[65]  Dipankar Das,et al.  JUNLP at SemEval-2020 Task 9: Sentiment Analysis of Hindi-English Code Mixed Data Using Grid Search Cross Validation , 2020, SemEval@COLING.

[66]  Rajendra Singh,et al.  Grammatical Constraints on Code-Mixing: Evidence from Hindi-English , 1985, Canadian Journal of Linguistics/Revue canadienne de linguistique.

[67]  Aditya Srivastava,et al.  HCMS at SemEval-2020 Task 9: A Neural Approach to Sentiment Analysis for Code-Mixed Texts , 2020, SemEval@COLING.

[68]  Apurva Parikh,et al.  IRLab_DAIICT at SemEval-2020 Task 9: Machine Learning and Deep Learning Methods for Sentiment Analysis of Code-Mixed Tweets , 2020, SemEval@COLING.

[69]  Shivendra K. Verma Code-Switching: Hindi-English , 1975 .

[70]  Mayank Singh,et al.  IIT Gandhinagar at SemEval-2020 Task 9: Code-Mixed Sentiment Classification Using Candidate Sentence Generation and Selection , 2020, SemEval@COLING.