Multilingual Cross-domain Perspectives on Online Hate Speech

In this report, we present a study of eight corpora of online hate speech, by demonstrating the NLP techniques that we used to collect and analyze the jihadist, extremist, racist, and sexist content. Analysis of the multilingual corpora shows that the different contexts share certain characteristics in their hateful rhetoric. To expose the main features, we have focused on text classification, text profiling, keyword and collocation extraction, along with manual annotation and qualitative study.

[1]  Karsten Müller,et al.  Fanning the Flames of Hate: Social Media and Hate Crime , 2020, Journal of the European Economic Association.

[2]  Rudresh Panchal,et al.  Online hatred of women in the Incels.me forum , 2019, Journal of Language Aggression and Conflict.

[3]  Tom De Smedt,et al.  Right-wing German Hate Speech on Twitter: Analysis and Automatic Detection , 2019, ArXiv.

[4]  Burgert A. Senekal,et al.  Employing sentiment analysis for gauging perceptions of minorities in multicultural societies: An analysis of Twitter feeds on the Afrikaner community of Orania in South Africa , 2018, The Journal for Transdisciplinary Research in Southern Africa.

[5]  George E. Fish Kill All Normies: Online Culture Wars from 4chan and Tumblr to Trump and the Alt-Right , 2018 .

[6]  Guy De Pauw,et al.  Automatic Detection of Online Jihadist Hate Speech , 2018, ArXiv.

[7]  Kathleen M. Carley,et al.  Online extremism and the communities that sustain it: Detecting the ISIS supporting community on Twitter , 2017, PloS one.

[8]  Alvin Zhou #Republic: Divided Democracy in the Age of Social Media , 2017 .

[9]  Debbie Ging,et al.  Alphas, Betas, and Incels: Theorizing the Masculinities of the Manosphere , 2017 .

[10]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[11]  Brian L. Ott The age of Twitter: Donald J. Trump and the politics of debasement , 2017 .

[12]  Walter Daelemans,et al.  The Automated Detection of Racist Discourse in Dutch Social Media , 2016 .

[13]  Paolo Rosso,et al.  Figurative messages and affect in Twitter: Differences between #irony, #sarcasm and #not , 2016, Knowl. Based Syst..

[14]  Gabriel Weimann,et al.  Terrorist Migration to the Dark Web , 2016 .

[15]  Matt Golder,et al.  Far Right Parties in Europe , 2016 .

[16]  Sadaaki Miyamoto,et al.  Spherical k-Means++ Clustering , 2015, MDAI.

[17]  Luís Tomé,et al.  THE "ISLAMIC STATE": TRAJECTORY AND REACH A YEAR AFTER ITS SELF-PROCLAMATION AS A "CALIPHATE" , 2015 .

[18]  Leaf Van Boven,et al.  Perceiving Political Polarization in the United States , 2015, Perspectives on psychological science : a journal of the Association for Psychological Science.

[19]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[20]  Sarah Sobieraj Book Review: Angry White Men: American Masculinity at the End of an Era , 2014 .

[21]  A. Arvidsson,et al.  Echo Chamber or Public Sphere? Predicting Political Orientation and Measuring Political Homophily in Twitter Using Big Data , 2014 .

[22]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[23]  Eliane Tschaen Barbieri,et al.  The YouTube Jihadists: A Social Network Analysis of Al-Muhajiroun’s Propaganda Campaign , 2012 .

[24]  Walter Daelemans,et al.  “Vreselijk mooi!” (terribly beautiful): A Subjectivity Lexicon for Dutch Adjectives. , 2012, LREC.

[25]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[26]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[27]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[28]  Jack Jedwab Diversity Management and Discrimination: Immigrants and Ethnic Minorities in the EU , 2009 .

[29]  Shlomo Argamon,et al.  Automatically profiling the author of an anonymous text , 2009, CACM.

[30]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[31]  J. Pennebaker,et al.  The Secret Life of Pronouns , 2003, Psychological science.

[32]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[33]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[34]  P. Davies The National Front in France: Ideology, Discourse and Power , 1999 .

[35]  R. Kerkvliet Redefining Violence, the Securitisation and Desecuritisation of Farm Attacks in post-Apartheid South Africa. , 2017 .

[36]  Étienne Pingaud,et al.  Far-Right Movements in France: The Principal Role of Front National and the Rise of Islamophobia , 2016 .

[37]  Corinne Torrekens Islam in Belgium: From Formal Recognition to Public Contestation , 2015 .

[38]  Walter Daelemans,et al.  Pattern for Python , 2012, J. Mach. Learn. Res..

[39]  Kevin W. Saunders What about Hate Speech , 2011 .

[40]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[41]  Challenges of Automatically Detecting Offensive Language Online : Participation Paper for the Germeval Shared Task 2018 ( H a UA ) , 2022 .