Aggression Detection on Social Media Text Using Deep Neural Networks

In the past few years, bully and aggressive posts on social media have grown significantly, causing serious consequences for victims/users of all demographics. Majority of the work in this field has been done for English only. In this paper, we introduce a deep learning based classification system for Facebook posts and comments of Hindi-English Code-Mixed text to detect the aggressive behaviour of/towards users. Our work focuses on text from users majorly in the Indian Subcontinent. The dataset that we used for our models is provided by TRAC-11 in their shared task. Our classification model assigns each Facebook post/comment to one of the three predefined categories: “Overtly Aggressive”, “Covertly Aggressive” and “Non-Aggressive”. We experimented with 6 classification models and our CNN model on a 10 K-fold crossvalidation gave the best result with the prediction accuracy of 73.2%.

[1]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[2]  Jatin Sharma,et al.  “I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook , 2014, CodeSwitch@EMNLP.

[3]  Riyaz Ahmad Bhat,et al.  IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search , 2014, FIRE.

[4]  Gianluca Stringhini,et al.  Mean Birds: Detecting Aggression and Bullying on Twitter , 2017, WebSci.

[5]  Stan Matwin,et al.  Offensive Language Detection Using Multi-level Classification , 2010, Canadian Conference on AI.

[6]  Abhishek Agrawal,et al.  Methods for detection of cyberbullying: A survey , 2015, 2015 15th International Conference on Intelligent Systems Design and Applications (ISDA).

[7]  Amit Awekar,et al.  Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms , 2018, ECIR.

[8]  Laura P. Del Bosque,et al.  Aggressive Text Detection for Cyberbullying , 2014, MICAI.

[9]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[10]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[11]  Heidi Vandebosch,et al.  Defining Cyberbullying: A Qualitative Research into the Perceptions of Youngsters , 2008, Cyberpsychology Behav. Soc. Netw..

[12]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[13]  Walid Magdy,et al.  Abusive Language Detection on Arabic Social Media , 2017, ALW@ACL.

[14]  Vinay Singh,et al.  Named Entity Recognition for Hindi-English Code-Mixed Social Media Text , 2018, NEWS@ACL.

[15]  Dipti Misra Sharma,et al.  Shallow Parsing Pipeline - Hindi-English Code-Mixed Social Media Text , 2016, NAACL.

[16]  Natalya Tarasova,et al.  Classification of Hate Tweets and Their Reasons using SVM , 2016 .

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Caitlin Elizabeth Ring Hate speech in social media: An exploration of the problem and its proposed solutions , 2013 .

[19]  Britney Summit-Gil,et al.  This is why we can’t have nice things: Mapping the relationship between online trolling and mainstream culture , 2016, New Media Soc..

[20]  Jack Grieve,et al.  Dimensions of Abusive Language on Twitter , 2017, ALW@ACL.

[21]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[22]  Vassilis P. Plagianakos,et al.  A Novel Adaptive Learning Rate Algorithm for Convolutional Neural Network Training , 2017, EANN.

[23]  Jatin Sharma,et al.  POS Tagging of English-Hindi Code-Mixed Social Media Content , 2014, EMNLP.

[24]  Tomoaki Ohtsuki,et al.  Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection , 2018, IEEE Access.

[25]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[26]  Kunihiko Fukushima,et al.  Neocognitron: A hierarchical neural network capable of visual pattern recognition , 1988, Neural Networks.

[27]  Theodore Chu,et al.  Comment Abuse Classification with Deep Learning , 2017 .

[28]  Ritesh Kumar,et al.  Aggression-annotated Corpus of Hindi-English Code-mixed Data , 2018, LREC.

[29]  L. Bottou Learning and Stochastic Approximations 3 Q ( z , w ) measures the economical cost ( in hard currency units ) of delivering , 2012 .