Detecting Offensive Tweets in Hindi-English Code-Switched Language

The exponential rise of social media websites like Twitter, Facebook and Reddit in linguistically diverse geographical regions has led to hybridization of popular native languages with English in an effort to ease communication. The paper focuses on the classification of offensive tweets written in Hinglish language, which is a portmanteau of the Indic language Hindi with the Roman script. The paper introduces a novel tweet dataset, titled Hindi-English Offensive Tweet (HEOT) dataset, consisting of tweets in Hindi-English code switched language split into three classes: non-offensive, abusive and hate-speech. Further, we approach the problem of classification of the tweets in HEOT dataset using transfer learning wherein the proposed model employing Convolutional Neural Networks is pre-trained on tweets in English followed by retraining on Hinglish tweets.

[1]  Jasper Friedrichs,et al.  InfyNLP at SMM4H Task 2: Stacked Ensemble of Shallow Convolutional Neural Networks for Identifying Personal Medication Intake from Twitter , 2018, SMM4H@AMIA.

[2]  K. Robert Lai,et al.  Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model , 2016, ACL.

[3]  Roger Zimmermann,et al.  Aspect-Based Financial Sentiment Analysis using Deep Learning , 2018, WWW.

[4]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[5]  Yi Yu,et al.  PROMPT: Personalized User Tag Recommendation for Social Media Photos Leveraging Personal and Social Contexts , 2016, 2016 IEEE International Symposium on Multimedia (ISM).

[6]  Yi Yu,et al.  ATLAS: Automatic Temporal Segmentation and Annotation of Lecture Videos Based on Modelling Transition Time , 2014, ACM Multimedia.

[7]  Björn Ross,et al.  Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis , 2016, ArXiv.

[8]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[9]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[10]  Ellen Spertus,et al.  Smokey: Automatic Recognition of Hostile Messages , 1997, AAAI/IAAI.

[11]  Elizabeth F. Churchill,et al.  Automatic identification of personal insults on social news sites , 2012, J. Assoc. Inf. Sci. Technol..

[12]  John R. Talburt,et al.  From Chirps to Whistles: Discovering Event-specific Informative Content from Twitter , 2015, WebSci.

[13]  Hongbo Deng,et al.  Ranking Relevance in Yahoo Search , 2016, KDD.

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Roger Zimmermann,et al.  A general feature-based map matching framework with trajectory simplification , 2016, IWGS@SIGSPATIAL.

[16]  Walid Magdy,et al.  Abusive Language Detection on Arabic Social Media , 2017, ALW@ACL.

[17]  Manoj Kumar,et al.  Improving Accuracy of SMS Based FAQ Retrieval System , 2011, FIRE.

[18]  Rohit Sinha,et al.  Language Modeling for Code-Switched Data: Challenges and Approaches , 2017, ArXiv.

[19]  Rajiv Ratn Shah,et al.  Multimodal-based Multimedia Analysis, Retrieval, and Services in Support of Social Media Applications , 2016, ACM Multimedia.

[20]  Dr. med. Rajiv Shah,et al.  Multimodal Analysis of User-Generated Multimedia Content , 2017, Socio-Affective Computing.

[21]  Fabrício Benevenuto,et al.  Analyzing the Targets of Hate in Online Social Media , 2016, ICWSM.

[22]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[23]  Rajiv Ratn Shah,et al.  SMS based FAQ Retrieval for Hindi, English and Malayalam , 2013, FIRE.

[24]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Yi Yu,et al.  User preference-aware music video generation based on modeling scene moods , 2014, MMSys '14.

[27]  Rajiv Ratn Shah,et al.  Multimodal Analysis of User-Generated Content in Support of Social Media Applications , 2016, ICMR.

[28]  Pascale Fung,et al.  One-step and Two-step Classification for Abusive Language Detection on Twitter , 2017, ALW@ACL.

[29]  Roger Zimmermann,et al.  EventBuilder: Real-time Multimedia Event Summarization by Visualizing Social Media , 2015, ACM Multimedia.

[30]  Yi Yu,et al.  ADVISOR: Personalized Video Soundtrack Recommendation by Late Fusion with Heuristic Rankings , 2014, ACM Multimedia.

[31]  Francesca Orsini,et al.  Dil Maange More: Cultural Contexts of Hinglish in Contemporary India , 2015 .

[32]  Rajiv Ratn Shah,et al.  #phramacovigilance - Exploring Deep Learning Techniques for Identifying Mentions of Medication Intake from Twitter , 2018, ArXiv.

[33]  Heri Ramampiaro,et al.  Effective hate-speech detection in Twitter data using recurrent neural networks , 2018, Applied Intelligence.

[34]  Shin'ichi Satoh,et al.  Concept-Level Multimodal Ranking of Flickr Photo Tags via Recall Based Weighting , 2016, MMCommons @ ACM Multimedia.

[35]  Sanjay K. Dwivedi,et al.  Machine Translation System in Indian Perspectives , 2010 .

[36]  Yi Yu,et al.  TRACE: Linguistic-Based Approach for Automatic Lecture Video Segmentation Leveraging Wikipedia Texts , 2015, 2015 IEEE International Symposium on Multimedia (ISM).

[37]  Jack Grieve,et al.  Dimensions of Abusive Language on Twitter , 2017, ALW@ACL.

[38]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[39]  Pushpak Bhattacharyya,et al.  When Transliteration Met Crowdsourcing : An Empirical Study of Transliteration via Crowdsourcing using Efficient, Non-redundant and Fair Quality Control , 2014, LREC.

[40]  Carolyn Penstein Rosé,et al.  Detecting offensive tweets via topical feature discovery over a large scale twitter corpus , 2012, CIKM.

[41]  Yi Yu,et al.  Leveraging multimodal information for event summarization and concept-level sentiment analysis , 2016, Knowl. Based Syst..

[42]  Björn Gambäck,et al.  Using Convolutional Neural Networks to Classify Hate-Speech , 2017, ALW@ACL.

[43]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[44]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.