Sentiment analysis of code - mix script

Due to the advent of social media and networking sites, people now have the opportunity to communicate with each other more easily and frequently than ever before. The analysis of the content of these communications can lead to numerous benefits to the governments and corporations across various industries. These will allow them in gauging the public sentiment on a multitude of items and issues, on which they can take necessary actions. However, to do so, one needs to decipher the content which is usually in the form of a complicated mix of multiple languages. In Indian social media, users often combine Romanized English with their mother tongue language for communications. In this paper, we have presented a number of techniques to identify the sentiment of text after normalizing the influence of multiple languages.

[1]  Mona T. Diab,et al.  Code Switch Point Detection in Arabic , 2013, NLDB.

[2]  Pieter Muysken,et al.  One Speaker, Two Languages: Cross-Disciplinary Perspectives on Code-Switching , 1995 .

[3]  Taofik Hidayat AN ANALYSIS OF CODE SWITCHING USED BY FACEBOOKERS (a Case Study in a Social Network Site) , 2012 .

[4]  P. Shukla,et al.  A bilingual parser for Hindi , English and code-switching structures , 2022 .

[5]  David C. S. Li Cantonese‐English code‐switching research in Hong Kong: a Y2K review , 2000 .

[6]  Falk Scholer,et al.  Machine transliteration survey , 2011, ACM Comput. Surv..

[7]  B. Danet,et al.  The Multilingual Internet , 2007 .

[8]  Kenji Araki,et al.  Text Normalization in Social Media: Progress, Problems and Applications for a Pre-Processing System of Casual English , 2011 .

[9]  R. Sinha,et al.  Machine Translation of Bi-lingual Hindi-English (Hinglish) Text , 2005, MTSUMMIT.

[10]  Jatin Sharma,et al.  POS Tagging of English-Hindi Code-Mixed Social Media Content , 2014, EMNLP.

[11]  Thomas Gottron,et al.  A Comparison of Language Identification Approaches on Short, Query-Style Texts , 2010, ECIR.

[12]  Jong-Seok Lee,et al.  Enhancing Lexicon-Based Review Classification by Merging and Revising Sentiment Dictionaries , 2013, IJCNLP.

[13]  J. Gumperz Discourse strategies: Subject index , 1982 .

[14]  Yang Liu,et al.  Learning to Predict Code-Switching Points , 2008, EMNLP.

[15]  Beatrice Alex,et al.  Automatic detection of English inclusions in mixed-lingual text with an application to parsing , 2008 .

[16]  Subhash Chandra,et al.  Hunting Elusive English in Hinglish and Benglish Text: Unfolding Challenges and Remedies , 2013 .

[17]  Jean‐Marc Dewaele,et al.  Emotions in Multiple Languages , 2010, Modern Language Review.

[18]  Subhash Chandra,et al.  Automatic detection of English words in Benglish text: A statistical approach , 2012, 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI).

[19]  Rajendra Singh,et al.  Grammatical Constraints on Code-Mixing: Evidence from Hindi-English , 1985, Canadian Journal of Linguistics/Revue canadienne de linguistique.

[20]  Prajwol Shrestha Incremental N-gram Approach for Language Identification in Code-Switched Text , 2014, CodeSwitch@EMNLP.

[21]  Shishir Bhattacharja Benglish Verbs: A Case of Code-mixing in Bengali , 2010, PACLIC.

[22]  Ben King,et al.  Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods , 2013, NAACL.

[23]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[24]  Rada Mihalcea,et al.  Word Sense Disambiguation with Multilingual Features , 2011, IWCS.

[25]  Paulseph-John Farrugia TTS pre-processing issues for mixed language support , 2004 .

[26]  A. Backus Code-switching in conversation: Language, interaction and identity , 2000 .

[27]  Shivendra K. Verma Code-Switching: Hindi-English , 1975 .

[28]  Grzegorz Chrupala,et al.  DCU-UVT: Word-Level Language Classification with Code-Mixed Data , 2014, CodeSwitch@EMNLP.

[29]  Pushpak Bhattacharyya,et al.  A Fall-back Strategy for Sentiment Analysis in Hindi: a Case Study , 2010 .

[30]  Dong Nguyen,et al.  Word Level Language Identification in Online Multilingual Communication , 2013, EMNLP.

[31]  Mona T. Diab,et al.  Token Level Identification of Linguistic Code Switching , 2012, COLING.

[32]  Joachim Wagner,et al.  Code Mixing: A Challenge for Language Identification in the Language of Social Media , 2014, CodeSwitch@EMNLP.

[33]  Susan C. Herring,et al.  The Multilingual Internet: Language, Culture, and Communication Online , 2007 .

[34]  Haizhou Li,et al.  Report of NEWS 2010 Transliteration Mining Shared Task , 2010, NEWS@ACL.