Understanding Language Preference for Expression of Opinion and Sentiment: What do Hindi-English Speakers do on Twitter?

Linguistic research on multilingual societies has indicated that there is usually a preferred language for expression of emotion and sentiment (Dewaele, 2010). Paucity of data has limited such studies to participant interviews and speech transcriptions from small groups of speakers. In this paper, we report a study on 430,000 unique tweets from Indian users, specifically Hindi-English bilinguals, to understand the language of preference, if any, for expressing opinion and sentiment. To this end, we develop classifiers for opinion detection in these languages, and further classifying opinionated tweets into positive, negative and neutral sentiments. Our study indicates that Hindi (i.e., the native language) is preferred over English for expression of negative opinion and swearing. As an aside, we explore some common pragmatic functions of code-switching through sentiment detection.

[1]  Pieter Muysken,et al.  One Speaker, Two Languages: Cross-Disciplinary Perspectives on Code-Switching , 1995 .

[2]  Yael Maschler The Language Games Bilinguals Play: Language Alternation at Language Game Boundaries. , 1991 .

[3]  Yael Maschler ‘Appreciation ha’araxa ’o ha’aratsa?’ [‘valuing or admiration’]: Negotiating contrast in bilingual disagreement talk , 1994 .

[4]  Jean‐Marc Dewaele,et al.  Emotions in Multiple Languages , 2010, Modern Language Review.

[5]  Preslav Nakov,et al.  SemEval-2015 Task 10: Sentiment Analysis in Twitter , 2015, *SEMEVAL.

[6]  Jure Leskovec,et al.  A computational approach to politeness with application to social factors , 2013, ACL.

[7]  Yang Liu,et al.  Learning to Predict Code-Switching Points , 2008, EMNLP.

[8]  Caroline Brun Learning Opinionated Patterns for Contextual Opinion Detection , 2012, COLING.

[9]  Eric Horvitz,et al.  Predicting Depression via Social Media , 2013, ICWSM.

[10]  Eneko Agirre Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation , 2012 .

[11]  Namita Mittal,et al.  Sentiment Analysis of Hindi Reviews based on Negation and Discourse Relation , 2013 .

[12]  Miwa Nishimura,et al.  A functional analysis of Japanese/English code-switching , 1995 .

[13]  Nanyun Peng,et al.  Learning Polylingual Topic Models from Code-Switched Social Media Documents , 2014, ACL.

[14]  Ashequl Qadir Detecting Opinion Sentences Specific to Product Features in Customer Reviews using Typed Dependency Relations , 2009 .

[15]  Jatin Sharma,et al.  POS Tagging of English-Hindi Code-Mixed Social Media Content , 2014, EMNLP.

[16]  Mohammed J. Zaki,et al.  Characterizing the effectiveness of twitter hashtags to detect and track online population sentiment , 2012, CHI Extended Abstracts.

[17]  Somnath Banerjee,et al.  Overview of FIRE-2015 Shared Task on Mixed Script Information Retrieval , 2015, FIRE Workshops.

[18]  Jatin Sharma,et al.  “I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook , 2014, CodeSwitch@EMNLP.

[19]  Soroush Vosoughi,et al.  Tweet Acts: A Speech Act Classifier for Twitter , 2016, ICWSM.

[20]  Julia Hirschberg,et al.  Overview for the First Shared Task on Language Identification in Code-Switched Data , 2014, CodeSwitch@EMNLP.

[21]  Niloy Ganguly,et al.  A Novel Two-stage Framework for Extracting Opinionated Sentences from News Articles , 2014, TextGraphs@EMNLP.

[22]  Yang Liu,et al.  Part-of-Speech Tagging for English-Spanish Code-Switched Text , 2008, EMNLP.

[23]  Rakesh Chandra Balabantaray,et al.  Text normalization of code mix and sentiment analysis , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[24]  Miguel A. Alonso,et al.  Sentiment Analysis on Monolingual, Multilingual and Code-Switching Twitter Corpora , 2015, WASSA@EMNLP.

[25]  김혜숙,et al.  Sociolinguistics , 2004, Language Teaching.

[26]  Saif Mohammad,et al.  #Emotional Tweets , 2012, *SEMEVAL.

[27]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[28]  Pushpak Bhattacharyya,et al.  A Sentiment Analyzer for Hindi Using Hindi Senti Lexicon , 2014, ICON.

[29]  Daniele Quercia,et al.  Emoticons and Phrases: Status Symbols in Social Media , 2014, ICWSM.

[30]  Jatin Sharma,et al.  Query word labeling and Back Transliteration for Indian Languages: Shared task system description , 2013 .

[31]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[32]  David Yarowsky,et al.  Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams , 2013, ACL.

[33]  Saif Mohammad,et al.  NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[34]  Inma Muñoa Barredo PRAGMATIC FUNCTIONS OF CODE-SWITCHING AMONG BASQUE-SPANISH BILINGUALS , 2003 .

[35]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[36]  Rana D. Parshad,et al.  What is India speaking? Exploring the “Hinglish” invasion , 2016 .

[37]  Joachim Wagner,et al.  Code Mixing: A Challenge for Language Identification in the Language of Social Media , 2014, CodeSwitch@EMNLP.

[38]  F. Grosjean Bilingual: Life and Reality , 2010 .

[39]  Estudios de,et al.  Blistering barnacles ! What language do multilinguals swear in ? ! , 2007 .

[40]  Peter Auer,et al.  One speaker, two languages: The pragmatics of code-switching: a sequential approach , 1995 .