论文信息 - The Role of Computational Stylometry in Identifying (Misogynistic) Aggression in English Social Media Texts - 字舞流文

The Role of Computational Stylometry in Identifying (Misogynistic) Aggression in English Social Media Texts

In this paper, we describe UniOr_ExpSys team participation in TRAC-2 (Trolling, Aggression and Cyberbullying) shared task, a workshop organized as part of LREC 2020. TRAC-2 shared task is organized in two sub-tasks: Aggression Identification (a 3-way classification between “Overtly Aggressive”, “Covertly Aggressive” and “Non-aggressive” text data) and Misogynistic Aggression Identification (a binary classifier for classifying the texts as “gendered” or “non-gendered”). Our approach is based on linguistic rules, stylistic features extraction through stylometric analysis and Sequential Minimal Optimization algorithm in building the two classifiers.

Johanna Monti | Antonio Pascucci | Raffaele Manna | Vincenzo Masucci | J. Monti | A. Pascucci | Raffaele Manna | Vincenzo Masucci

[1] Felice Dell'Orletta,et al. Hate Me, Hate Me Not: Hate Speech Detection on Facebook , 2017, ITASEC.

[2] Paolo Rosso,et al. SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[3] Rong Zheng,et al. A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006, J. Assoc. Inf. Sci. Technol..

[4] Atul Kr. Ojha,et al. Developing a Multilingual Annotated Corpus of Misogyny and Aggression , 2020, TRAC.

[5] Wlodek Zadrozny,et al. UVA Wahoos at SemEval-2019 Task 6: Hate Speech Identification using Ensemble Machine Learning , 2019, *SEMEVAL.

[6] Cristina Bosco,et al. Hate Speech Annotation: Analysis of an Italian Twitter Corpus , 2017, CLiC-it.

[7] Serena Villata,et al. A System to Monitor Cyberbullying based on Message Classification and Social Network Analysis , 2019, Proceedings of the Third Workshop on Abusive Language Online.

[8] Cristina Bosco,et al. An Impossible Dialogue! Nominal Utterances and Populist Rhetoric in an Italian Twitter Corpus of Hate Speech against Immigrants , 2018, LREC.

[9] Henry Lieberman,et al. Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[10] Sérgio Nunes,et al. A Survey on Automatic Detection of Hate Speech in Text , 2018, ACM Comput. Surv..

[11] Michael Wiegand,et al. A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[12] Michael Wiegand,et al. Overview of the GermEval 2018 Shared Task on the Identification of Offensive Language , 2018 .

[13] Ritesh Kumar,et al. Benchmarking Aggression Identification in Social Media , 2018, TRAC@COLING 2018.

[14] C. Y. Peng,et al. An Introduction to Logistic Regression Analysis and Reporting , 2002 .

[15] Giovanni Semeraro,et al. Computational Linguistics Against Hate: Hate Speech Detection and Visualization on Social Media in the "Contro L'Odio" Project , 2019, CLiC-it.

[16] Preslav Nakov,et al. SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval) , 2019, *SEMEVAL.

[17] Alexander F. Gelbukh,et al. Aggression Detection in Social Media: Using Deep Neural Networks, Data Augmentation, and Pseudo Labeling , 2018, TRAC@COLING 2018.

[18] Ralf Krestel,et al. Aggression Identification Using Deep Learning and Data Augmentation , 2018, TRAC@COLING 2018.

[19] Paolo Rosso,et al. Overview of the Evalita 2018 Task on Automatic Misogyny Identification (AMI) , 2018, EVALITA@CLiC-it.

[20] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[21] Shervin Malmasi,et al. Evaluating Aggression Identification in Social Media , 2020, TRAC.

[22] Pasquale Lops,et al. Modeling Community Behavior through Semantic Analysis of Social Data: The Italian Hate Map Experience , 2016, UMAP.

[23] Sara Tonelli,et al. Creating a WhatsApp Dataset to Study Pre-teen Cyberbullying , 2018, ALW.

[24] Jun-Ming Xu,et al. Learning from Bullying Traces in Social Media , 2012, NAACL.

[25] Ingmar Weber,et al. Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[26] Andy Liaw,et al. Classification and Regression by randomForest , 2007 .

[27] Dolf Trieschnigg,et al. Improving Cyberbullying Detection with User Context , 2013, ECIR.

[28] Walter Daelemans,et al. Explanation in Computational Stylometry , 2013, CICLing.

[29] George M. Mohay,et al. Mining e-mail content for author identification forensics , 2001, SGMD.

[30] Walter Daelemans,et al. Multilingual Cross-domain Perspectives on Online Hate Speech , 2018, ArXiv.

[31] Shlomo Argamon,et al. Style mining of electronic messages for multiple authorship discrimination: first results , 2003, KDD '03.

[32] Johanna Monti,et al. Computational Stylometry and Machine Learning for Gender and Age Detection in Cyberbullying Texts , 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW).

[33] Shervin Malmasi,et al. Detecting Hate Speech in Social Media , 2017, RANLP.

[34] J. Platt. Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[35] Matthew Leighton Williams,et al. Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making , 2015 .

[36] H. van Halteren,et al. Outside the cave of shadows: using syntactic annotation to enhance authorship attribution , 1996 .

[37] Preslav Nakov,et al. Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[38] Teresa Gonçalves,et al. Fully Connected Neural Network with Advance Preprocessor to Identify Aggression over Facebook and Twitter , 2018, TRAC@COLING 2018.

[39] Felice Dell'Orletta,et al. Overview of the EVALITA 2018 Hate Speech Detection Task , 2018, EVALITA@CLiC-it.