Sarcasm detection in mash-up language using soft-attention based bi-directional LSTM and feature-rich CNN

Abstract Analyzing explicit and clear sentiment is challenging owing to the growing use of emblematic and multilingual language constructs. This research proposes sarcasm detection using deep learning in code-switch tweets, specifically the mash-up of English with Indian native language, Hindi. The proposed model is a hybrid of bidirectional long short-term memory with a softmax attention layer and convolution neural network for real-time sarcasm detection. To evaluate the performance of the proposed model, real-time mash-up tweets are extracted on the trending political (#government) and entertainment (#cricket, #bollywood) posts on Twitter. The randomly sampled dataset contains 3000 sarcastic and 3000 non-sarcastic bilingual Hinglish (Hindi + English) tweets. Feature engineering is done using pre-trained GloVe word embeddings to extract English semantic context vector, hand-crafted features using subjective lexicon Hindi-SentiWordNet to generate the SentiHindi feature vector and an auxiliary pragmatic feature vector depicting the count of pragmatic markers in tweet. Performance analysis is done to compare and validate the proposed softAtt BiLSTM- feature-rich CNN model. The model outperforms the baseline deep learning models with a superior classification accuracy of 92.71% and F-measure of 89.05%.

[1]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[2]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[3]  Ruiying Du,et al.  Sentiment Analysis of Code-Mixed Bambara-French Social Media Text Using Deep Learning Techniques , 2018, Wuhan University Journal of Natural Sciences.

[4]  Pradip Kumar Bala,et al.  Sarcasm detection in microblogs using Naïve Bayes and fuzzy clustering , 2017 .

[5]  Yaron Matras,et al.  The mixed language debate : theoretical and empirical advances , 2003 .

[6]  Akshi Kumar,et al.  Systematic literature review on context-based sentiment analysis in social multimedia , 2019, Multimedia Tools and Applications.

[7]  Vinay Singh,et al.  A Dataset for Detecting Irony in Hindi-English Code-Mixed Social Media Text , 2018, EMSASW@ESWC.

[8]  Byron C. Wallace,et al.  Modelling Context with User Embeddings for Sarcasm Detection in Social Media , 2016, CoNLL.

[9]  Aditya Joshi,et al.  Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models , 2018, ICON.

[10]  Ka-Chun Wong,et al.  Verbal aggression detection on Twitter comments: convolutional neural network for short-text sentiment analysis , 2018, Neural Computing and Applications.

[11]  Dipankar Das,et al.  Language Identification of Bengali-English Code-Mixed Data using Character & Phonetic based LSTM Models , 2019, FIRE.

[12]  Partha Pakray,et al.  An HMM Based POS Tagger for POS Tagging of Code-Mixed Indian Social Media Text , 2018 .

[13]  Arkaitz Zubiaga,et al.  Detection and Resolution of Rumours in Social Media , 2017, ACM Comput. Surv..

[14]  Sinisa Todorovic,et al.  Learning to Learn Second-Order Back-Propagation for CNNs Using LSTMs , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[15]  Rada Mihalcea,et al.  CASCADE: Contextual Sarcasm Detection in Online Discussion Forums , 2018, COLING.

[16]  Chad Nilep "Code Switching" in Sociocultural Linguistics , 2006 .

[17]  Erik Cambria,et al.  Sentiment and Sarcasm Classification With Multitask Learning , 2019, IEEE Intelligent Systems.

[18]  Erik Cambria,et al.  Phonetic-Based Microtext Normalization for Twitter Sentiment Analysis , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[19]  Rakesh Chandra Balabantaray,et al.  Text normalization of code mix and sentiment analysis , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[20]  Sinisa Todorovic,et al.  Recurrent Temporal Deep Field for Semantic Video Labeling , 2016, ECCV.

[21]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[22]  Braja Gopal Patra,et al.  Sentiment Analysis of Code-Mixed Indian Languages: An Overview of SAIL_Code-Mixed Shared Task @ICON-2017 , 2018, ArXiv.

[23]  Vinay Singh,et al.  A Corpus of English-Hindi Code-Mixed Tweets for Sarcasm Detection , 2018, ArXiv.

[24]  Han Ren,et al.  Context-augmented convolutional neural networks for twitter sarcasm detection , 2018, Neurocomputing.

[25]  Rakesh Chandra Balabantaray,et al.  Sentiment analysis of code - mix script , 2015, 2015 International Conference on Computing and Network Communications (CoCoNet).

[26]  Isabelle Guyon,et al.  An Introduction to Feature Extraction , 2006, Feature Extraction.

[27]  Jan-Michael Frahm,et al.  Recurrent Neural Network for Learning DenseDepth and Ego-Motion from Video , 2018, ArXiv.

[28]  Rana D. Parshad,et al.  What is India speaking? Exploring the “Hinglish” invasion , 2016 .

[29]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[30]  Joachim Wagner,et al.  Code Mixing: A Challenge for Language Identification in the Language of Social Media , 2014, CodeSwitch@EMNLP.

[31]  Sanjay Kumar Jena,et al.  Sarcastic sentiment detection in tweets streamed in real time: a big data approach , 2016, Digit. Commun. Networks.

[32]  Jan-Michael Frahm,et al.  Retweet Wars: Tweet Popularity Prediction via Dynamic Multimodal Regression , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33]  Rayner Alfred,et al.  Natural language processing based features for sarcasm detection: An investigation using bilingual social media texts , 2017, 2017 8th International Conference on Information Technology (ICIT).

[34]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[35]  Saurabh Malgaonkar,et al.  Mixed bilingual social media analytics: case study: Live Twitter data , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[36]  Iyad Rahwan,et al.  Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm , 2017, EMNLP.

[37]  Shweta Rana Sentiment Analysis for Hindi Text using Fuzzy Logic , 2011 .

[38]  Miguel A. Alonso,et al.  Sentiment Analysis on Monolingual, Multilingual and Code-Switching Twitter Corpora , 2015, WASSA@EMNLP.