An experimental study on feature engineering and learning approaches for aggression detection in social media

With the widespread of modern technologies and social media networks, a new form of bullying occurring anytime and anywhere has emerged. This new phenomenon, known as cyberaggression or cyberbullying, refers to aggressive and intentional acts aiming at repeatedly causing harm to other person involving rude, insulting, offensive, teasing or demoralising comments through online social media. As these aggressions represent a threatening experience to Internet users, especially kids and teens who are still shaping their identities, social relations and well-being, it is crucial to understand how cyberbullying occurs to prevent it from escalating. Considering the massive information on the Web, the developing of intelligent techniques for automatically detecting harmful content is gaining importance, allowing the monitoring of large-scale social media and the early detection of unwanted and aggressive situations. Even though several approaches have been developed over the last few years based both on traditional and deep learning techniques, several concerns arise over the duplication of research and the difficulty of comparing results. Moreover, there is no agreement regarding neither which type of technique is better suited for the task, nor the type of features in which learning should be based. The goal of this work is to shed some light on the effects of learning paradigms and feature engineering approaches for detecting aggressions in social media texts. In this context, this work provides an evaluation of diverse traditional and deep learning techniques based on diverse sets of features, across multiple social media sites.

[1]  Helena Gómez-Adorno,et al.  A Machine Learning Approach for Detecting Aggressive Tweets in Spanish , 2018, IberEval@SEPLN.

[2]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[3]  Gianluca Stringhini,et al.  Mean Birds: Detecting Aggression and Bullying on Twitter , 2017, WebSci.

[4]  Khalil El Hindi A noise tolerant fine tuning algorithm for the Naïve Bayesian learning algorithm , 2014, J. King Saud Univ. Comput. Inf. Sci..

[5]  Kelly Reynolds,et al.  Using Machine Learning to Detect Cyberbullying , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[6]  Ashish Sureka,et al.  Applying Social Media Intelligence for Predicting and Identifying On-line Radicalization and Civil Unrest Oriented Threats , 2015, ArXiv.

[7]  Paolo Rosso,et al.  Irony Detection in Twitter , 2016, ACM Trans. Internet Techn..

[8]  Thamar Solorio,et al.  RiTUAL-UH at TRAC 2018 Shared Task: Aggression Identification , 2018, TRAC@COLING 2018.

[9]  Yulan He,et al.  Approaches to Automated Detection of Cyberbullying: A Survey , 2020, IEEE Transactions on Affective Computing.

[10]  Franciska de Jong,et al.  Cyberbullying detection: a step toward a safer internet yard , 2012, WWW.

[11]  Eibe Frank,et al.  Naive Bayes for Text Classification with Unbalanced Classes , 2006, PKDD.

[12]  Petra Kralj Novak,et al.  Sentiment of Emojis , 2015, PloS one.

[13]  Horacio Saggion,et al.  Modelling Irony in Twitter , 2014, EACL.

[14]  Thomas Wöhner,et al.  Detecting Online Harassment in Social Networks , 2014, ICIS.

[15]  Загоровская Ольга Владимировна,et al.  Исследование влияния пола и психологических характеристик автора на количественные параметры его текста с использованием программы Linguistic Inquiry and Word Count , 2015 .

[16]  Justin W. Patchin,et al.  Bullying, Cyberbullying, and Suicide , 2010, Archives of suicide research : official journal of the International Academy for Suicide Research.

[17]  Gregory W. Corder,et al.  Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach , 2009 .

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Shivakant Mishra,et al.  Analyzing Labeled Cyberbullying Incidents on the Instagram Social Network , 2015, SocInfo.

[20]  Robin M. Kowalski,et al.  Cyberbullying Via Social Media , 2015 .

[21]  Hugo Jair Escalante,et al.  Overview of MEX-A3T at IberLEF 2019: Authorship and Aggressiveness Analysis in Mexican Spanish Tweets , 2018, IberLEF@SEPLN.

[22]  Ying Chen,et al.  Detecting Offensive Language in Social Media to Protect Adolescent Online Safety , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[23]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[24]  Ritesh Kumar,et al.  Aggression-annotated Corpus of Hindi-English Code-mixed Data , 2018, LREC.

[25]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[26]  Hsin-Hsi Chen,et al.  Disambiguating False-Alarm Hashtag Usages in Tweets for Irony Detection , 2018, ACL.

[27]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[28]  Somnath Banerjee,et al.  Deep Analysis in Aggressive Mexican Tweets , 2018, IberEval@SEPLN.

[29]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[30]  Ritesh Kumar,et al.  Benchmarking Aggression Identification in Social Media , 2018, TRAC@COLING 2018.

[31]  Vikas S. Chavan,et al.  Machine learning approach for detection of cyber-aggressive comments by peers on social media network , 2015, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[32]  Alexander F. Gelbukh,et al.  Aggression Detection in Social Media: Using Deep Neural Networks, Data Augmentation, and Pseudo Labeling , 2018, TRAC@COLING 2018.

[33]  Ralf Krestel,et al.  Aggression Identification Using Deep Learning and Data Augmentation , 2018, TRAC@COLING 2018.

[34]  Asif Ekbal,et al.  An Ensemble Approach for Aggression Identification in English and Hindi Text , 2018, TRAC@COLING 2018.

[36]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[37]  Srini Ramaswamy,et al.  SafeChat: A tool to shield children's communication from explicit messages , 2014, 2014 14th International Conference on Innovations for Community Services (I4CS).

[38]  Walter Daelemans,et al.  Automatic Detection and Prevention of Cyberbullying , 2015 .

[39]  Thiago Galery,et al.  Aggression Identification and Multi Lingual Word Embeddings , 2018, TRAC@COLING 2018.