FII-UAIC at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text Using CNN

The “Sentiment Analysis for Code-Mixed Social Media Text” task at the SemEval 2020 competition focuses on sentiment analysis in code-mixed social media text 1 , specifically, on the combination of English with Spanish (Spanglish) and Hindi (Hinglish). In this paper, we present a system able to classify tweets, from Spanish and English languages, into positive, negative and neutral. Firstly, we built a classifier able to provide corresponding sentiment labels. Besides the sentiment labels, we provide the language labels at the word level. Secondly, we generate a word-level representation, using Convolutional Neural Network (CNN) architecture. Our solution indicates promising results for the Sentimix Spanglish-English task (0.744), the team, Lavinia_Ap, occupied the 9th place. However, for the Sentimix Hindi-English task (0.324) the results have to be improved.

[1]  Rosalyn Negrón Goldbarg Spanish-English Codeswitching in Email Communication , 2009 .

[2]  Dipankar Das,et al.  Language Identification of Bengali-English Code-Mixed Data using Character & Phonetic based LSTM Models , 2019, FIRE.

[3]  Philipp Koehn,et al.  De-Mixing Sentiment from Code-Mixed Text , 2019, ACL.

[4]  Ellen Contini-Morava,et al.  Duelling Languages: Grammatical Structure in Codeswitching , 1995 .

[5]  Manish Shrivastava,et al.  Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text , 2016, COLING.

[6]  Yang Liu,et al.  Analyzing language samples of Spanish-English bilingual children for the automated prediction of language dominance , 2011, Nat. Lang. Eng..

[7]  Niloy Ganguly,et al.  Understanding Language Preference for Expression of Opinion and Sentiment: What do Hindi-English Speakers do on Twitter? , 2016, EMNLP.

[8]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[9]  Kamal Sarkar,et al.  JU_KS@SAIL_CodeMixed-2017: Sentiment Analysis for Indian Code Mixed Social Media Texts , 2018, ArXiv.

[10]  Tanmoy Chakraborty,et al.  SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets , 2020, SEMEVAL.

[11]  Prathyusha Danda,et al.  Code-Mixed Sentiment Analysis Using Machine Learning and Neural Network Approaches , 2018, ArXiv.

[12]  Haizhou Li,et al.  Recurrent neural network language modeling for code switching conversational speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Thamar Solorio,et al.  Overview for the Second Shared Task on Language Identification in Code-Switched Data , 2014, CodeSwitch@EMNLP.

[14]  Mitesh M. Khapra,et al.  Improving the Multilingual User Experience of Wikipedia Using Cross-Language Name Search , 2010, NAACL.

[15]  Somnath Banerjee,et al.  Overview of FIRE-2015 Shared Task on Mixed Script Information Retrieval , 2015, FIRE Workshops.

[16]  Patricio Martínez-Barco,et al.  The OpAL System at NTCIR 8 MOAT , 2010, NTCIR.

[17]  Clare R. Voss,et al.  Finding Romanized Arabic Dialect in Code-Mixed Tweets , 2014, LREC.

[18]  Braja Gopal Patra,et al.  Sentiment Analysis of Code-Mixed Indian Languages: An Overview of SAIL_Code-Mixed Shared Task @ICON-2017 , 2018, ArXiv.

[19]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[20]  C. Myers-Scotton Common and uncommon ground: Social and structural factors in codeswitching , 1993, Language in Society.

[21]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[22]  Marius Cioca,et al.  Detecting Emotions in Comments on Forums , 2014, Int. J. Comput. Commun. Control.

[23]  Pushpak Bhattacharyya,et al.  Are Word Embedding-based Features Useful for Sarcasm Detection? , 2016, EMNLP.