The Role of Computational Stylometry in Identifying (Misogynistic) Aggression in English Social Media Texts

In this paper, we describe UniOr_ExpSys team participation in TRAC-2 (Trolling, Aggression and Cyberbullying) shared task, a workshop organized as part of LREC 2020. TRAC-2 shared task is organized in two sub-tasks: Aggression Identification (a 3-way classification between “Overtly Aggressive”, “Covertly Aggressive” and “Non-aggressive” text data) and Misogynistic Aggression Identification (a binary classifier for classifying the texts as “gendered” or “non-gendered”). Our approach is based on linguistic rules, stylistic features extraction through stylometric analysis and Sequential Minimal Optimization algorithm in building the two classifiers.

[1]  Felice Dell'Orletta,et al.  Hate Me, Hate Me Not: Hate Speech Detection on Facebook , 2017, ITASEC.

[2]  Paolo Rosso,et al.  SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[3]  Rong Zheng,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006, J. Assoc. Inf. Sci. Technol..

[4]  Atul Kr. Ojha,et al.  Developing a Multilingual Annotated Corpus of Misogyny and Aggression , 2020, TRAC.

[5]  Wlodek Zadrozny,et al.  UVA Wahoos at SemEval-2019 Task 6: Hate Speech Identification using Ensemble Machine Learning , 2019, *SEMEVAL.

[6]  Cristina Bosco,et al.  Hate Speech Annotation: Analysis of an Italian Twitter Corpus , 2017, CLiC-it.

[7]  Serena Villata,et al.  A System to Monitor Cyberbullying based on Message Classification and Social Network Analysis , 2019, Proceedings of the Third Workshop on Abusive Language Online.

[8]  Cristina Bosco,et al.  An Impossible Dialogue! Nominal Utterances and Populist Rhetoric in an Italian Twitter Corpus of Hate Speech against Immigrants , 2018, LREC.

[9]  Henry Lieberman,et al.  Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[10]  Sérgio Nunes,et al.  A Survey on Automatic Detection of Hate Speech in Text , 2018, ACM Comput. Surv..

[11]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[12]  Michael Wiegand,et al.  Overview of the GermEval 2018 Shared Task on the Identification of Offensive Language , 2018 .

[13]  Ritesh Kumar,et al.  Benchmarking Aggression Identification in Social Media , 2018, TRAC@COLING 2018.

[14]  C. Y. Peng,et al.  An Introduction to Logistic Regression Analysis and Reporting , 2002 .

[15]  Giovanni Semeraro,et al.  Computational Linguistics Against Hate: Hate Speech Detection and Visualization on Social Media in the "Contro L'Odio" Project , 2019, CLiC-it.

[16]  Preslav Nakov,et al.  SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval) , 2019, *SEMEVAL.

[17]  Alexander F. Gelbukh,et al.  Aggression Detection in Social Media: Using Deep Neural Networks, Data Augmentation, and Pseudo Labeling , 2018, TRAC@COLING 2018.

[18]  Ralf Krestel,et al.  Aggression Identification Using Deep Learning and Data Augmentation , 2018, TRAC@COLING 2018.

[19]  Paolo Rosso,et al.  Overview of the Evalita 2018 Task on Automatic Misogyny Identification (AMI) , 2018, EVALITA@CLiC-it.

[20]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[21]  Shervin Malmasi,et al.  Evaluating Aggression Identification in Social Media , 2020, TRAC.

[22]  Pasquale Lops,et al.  Modeling Community Behavior through Semantic Analysis of Social Data: The Italian Hate Map Experience , 2016, UMAP.

[23]  Sara Tonelli,et al.  Creating a WhatsApp Dataset to Study Pre-teen Cyberbullying , 2018, ALW.

[24]  Jun-Ming Xu,et al.  Learning from Bullying Traces in Social Media , 2012, NAACL.

[25]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[26]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[27]  Dolf Trieschnigg,et al.  Improving Cyberbullying Detection with User Context , 2013, ECIR.

[28]  Walter Daelemans,et al.  Explanation in Computational Stylometry , 2013, CICLing.

[29]  George M. Mohay,et al.  Mining e-mail content for author identification forensics , 2001, SGMD.

[30]  Walter Daelemans,et al.  Multilingual Cross-domain Perspectives on Online Hate Speech , 2018, ArXiv.

[31]  Shlomo Argamon,et al.  Style mining of electronic messages for multiple authorship discrimination: first results , 2003, KDD '03.

[32]  Johanna Monti,et al.  Computational Stylometry and Machine Learning for Gender and Age Detection in Cyberbullying Texts , 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW).

[33]  Shervin Malmasi,et al.  Detecting Hate Speech in Social Media , 2017, RANLP.

[34]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[35]  Matthew Leighton Williams,et al.  Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making , 2015 .

[36]  H. van Halteren,et al.  Outside the cave of shadows: using syntactic annotation to enhance authorship attribution , 1996 .

[37]  Preslav Nakov,et al.  Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[38]  Teresa Gonçalves,et al.  Fully Connected Neural Network with Advance Preprocessor to Identify Aggression over Facebook and Twitter , 2018, TRAC@COLING 2018.

[39]  Felice Dell'Orletta,et al.  Overview of the EVALITA 2018 Hate Speech Detection Task , 2018, EVALITA@CLiC-it.