Code-switching patterns can be an effective route to improve performance of downstream NLP applications: A case study of humour, sarcasm and hate speech detection

In this paper we demonstrate how code-switching patterns can be utilised to improve various downstream NLP applications. In particular, we encode different switching features to improve humour, sarcasm and hate speech detection tasks. We believe that this simple linguistic observation can also be potentially helpful in improving other similar NLP applications.

[1]  María José García Vizcaíno,et al.  Association HUMOR IN CODE-MIXED AIRLINE ADVERTISING , 2011 .

[2]  Niloy Ganguly,et al.  Understanding Language Preference for Expression of Opinion and Sentiment: What do Hindi-English Speakers do on Twitter? , 2016, EMNLP.

[3]  Fatiha Sadat,et al.  Named Entity Recognition and Hashtag Decomposition to Improve the Classification of Tweets , 2016, NUT@COLING.

[4]  Saurabh Singh,et al.  All that is English may be Hindi: Enhancing language identification through automatic ranking of the likeliness of word borrowing in social media , 2017, EMNLP.

[5]  Pascale Fung,et al.  A Hindi-English Code-Switching Corpus , 2014, LREC.

[6]  Matthew Pierce,et al.  Supervised Classification Using Balanced Training , 2014, SLSP.

[7]  Monojit Choudhury,et al.  I may talk in English but gaali toh Hindi mein hi denge : A study of English-Hindi code-switching and swearing pattern on social networks , 2017, 2017 9th International Conference on Communication Systems and Networks (COMSNETS).

[8]  Ingmar Weber,et al.  Understanding Abuse: A Typology of Abusive Language Detection Subtasks , 2017, ALW@ACL.

[9]  Manish Shrivastava,et al.  Gender Prediction in English-Hindi Code-Mixed Social Media Content : Corpus and Baseline System , 2018, Computación y Sistemas.

[10]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[11]  Timothy Baldwin,et al.  Automatically Constructing a Normalisation Dictionary for Microblogs , 2012, EMNLP.

[12]  J. Siegel How to get a laugh in Fijian: Code-switching and humor , 1995, Language in Society.

[13]  Manish Shrivastava,et al.  Classification Of Spanish Election Tweets (COSET) 2017 : Classifying Tweets Using Character and Word Level Features , 2017, IberEval@SEPLN.

[14]  Vinay Singh,et al.  A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection , 2018, PEOPLES@NAACL-HTL.

[15]  Taofik Hidayat AN ANALYSIS OF CODE SWITCHING USED BY FACEBOOKERS (a Case Study in a Social Network Site) , 2012 .

[16]  Vinay Singh,et al.  A Corpus of English-Hindi Code-Mixed Tweets for Sarcasm Detection , 2018, ArXiv.

[17]  Manish Shrivastava,et al.  Humor Detection in English-Hindi Code-Mixed Social Media Content : Corpus and Baseline System , 2018, LREC.