He said “who’s gonna take care of your children when you are at ACL?”: Reported Sexist Acts are Not Sexist

In a context of offensive content mediation on social media now regulated by European laws, it is important not only to be able to automatically detect sexist content but also to identify if a message with a sexist content is really sexist or is a story of sexism experienced by a woman. We propose: (1) a new characterization of sexist content inspired by speech acts theory and discourse analysis studies, (2) the first French dataset annotated for sexism detection, and (3) a set of deep learning experiments trained on top of a combination of several tweet’s vectorial representations (word embeddings, linguistic features, and various generalization strategies). Our results are encouraging and constitute a first step towards offensive content moderation.

[1]  Annie Piolat,et al.  An example of text analysis software (EMOTAIX-Tropes) use: The influence of anxiety on expressive writing , 2009 .

[2]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[3]  Laurent Romary,et al.  CamemBERT: a Tasty French Language Model , 2019, ACL.

[4]  Bianca Cepollaro,et al.  In defence of a presuppositional account of slurs , 2015 .

[5]  Paolo Rosso,et al.  SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[6]  Mari J. Matsuda Words That Wound: Critical Race Theory, Assaultive Speech, And The First Amendment , 1993 .

[7]  Zeyu Li,et al.  Learning Gender-Neutral Word Embeddings , 2018, EMNLP.

[8]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[9]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[10]  Vasudeva Varma,et al.  FERMI at SemEval-2019 Task 5: Using Sentence embeddings to Identify Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[11]  Yoav Goldberg,et al.  Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them , 2019, NAACL-HLT.

[12]  Pascale Fung,et al.  Reducing Gender Bias in Abusive Language Detection , 2018, EMNLP.

[13]  Cecilia Ovesdotter Alm,et al.  An Analysis of Domestic Abuse Discourse on Reddit , 2015, EMNLP.

[14]  Amit P. Sheth,et al.  Gender-based violence in 140 characters or fewer: a #BigData case study of Twitter , 2015, PeerJ Prepr..

[15]  Richard Delgado,et al.  The Harm in Hate Speech , 2013 .

[16]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[17]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[18]  A. Barak Sexual Harassment on the Internet , 2005 .

[19]  C. Bianchi,et al.  The speech acts account of derogatory epithets: some critical notes , 2022 .

[20]  Stan Matwin,et al.  Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs , 2018, ALW.

[21]  M. Deuchar Feminism and linguistic theory , 1987 .

[22]  Ziqi Zhang,et al.  Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter , 2018, Semantic Web.

[23]  Ishani Maitra,et al.  Speech and Harm: Controversies Over Free Speech , 2012 .

[24]  Vasudeva,et al.  Using Sentence Embeddings to identify Hate Speech against Immigrants and Women on Twitter , 2022 .

[25]  Pat K. Chew,et al.  Subtly Sexist Language , 2007 .

[26]  Huang,et al.  Language and Sexism , 2001 .

[27]  Richard Delgado,et al.  Words that Wound: A Tort Action for Racial Insults, Epithets, and Name-Calling , 1982 .

[28]  Yangqiu Song,et al.  Multilingual and Multi-Aspect Hate Speech Analysis , 2019, EMNLP.

[29]  Arianna Falbo,et al.  Spitting Out the Kool-Aid: A Review of Kate Manne’s Down Girl: The Logic of Misogyny , 2018 .

[30]  COMMUNICATION FROM THE COMMISSION TO THE EUROPEAN PARLIAMENT, THE COUNCIL, THE EUROPEAN ECONOMIC AND SOCIAL COMMITTEE AND THE COMMITTEE OF THE REGIONS , 2008 .

[31]  Csr Young,et al.  How to Do Things With Words , 2009 .

[32]  Paolo Rosso,et al.  Automatic Identification and Classification of Misogynistic Language on Twitter , 2018, NLDB.

[33]  Simone Bonnafous,et al.  « Femme politique » : une question de genre ? , 2003 .

[34]  Nabil Hathout,et al.  Wiktionnaire's Wikicode GLAWIfied: a Workable French Machine-Readable Dictionary , 2016, LREC.

[35]  Jww Studd … Mood , 2001 .

[36]  Haoyun Dai,et al.  Sexism in News: A Comparative Study on the Portray of Female and Male Politicians in The New York Times , 2014 .

[37]  S. Lemon,et al.  The Ambivalent Sexism Inventory : Differentiating Hostile and Benevolent Sexism , 2001 .

[38]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[39]  Radhika Mamidi,et al.  When does a compliment become sexist? Analysis and classification of ambivalent sexism using twitter data , 2017, NLP+CSS@ACL.

[40]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[41]  J. Megarry Online incivility or sexual harassment? Conceptualising women's experiences in the digital age , 2014 .

[42]  Vasudeva Varma,et al.  Multi-label Categorization of Accounts of Sexism using a Neural Framework , 2019, EMNLP.

[43]  Cody Buntain,et al.  A Large Labeled Corpus for Online Harassment Research , 2017, WebSci.

[44]  Karen Ross Gender Equality and the Media: A Challenge for Europe , 2016 .

[45]  Eduardo Blanco,et al.  Incorporating Emoji Descriptions Improves Tweet Classification , 2019, NAACL.

[46]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[47]  Mohit Bansal,et al.  SafeCity: Understanding Diverse Forms of Sexual Harassment Personal Stories , 2018, EMNLP.

[48]  Langton Rae,et al.  Beyond Belief: Pragmatics in Hate Speech and Pornography1 , 2012 .

[49]  Mai ElSherief,et al.  Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media , 2018, ICWSM.

[50]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[51]  Christopher Potts The logic of conventional implicatures , 2004 .

[52]  Paolo Rosso,et al.  Overview of the Evalita 2018 Task on Automatic Misogyny Identification (AMI) , 2018, EVALITA@CLiC-it.

[53]  Ingmar Weber,et al.  Understanding Abuse: A Typology of Abusive Language Detection Subtasks , 2017, ALW@ACL.

[54]  M. Lazar Feminist Critical Discourse Analysis: Articulating a Feminist Discourse Praxis1 , 2007 .

[55]  Fabienne Baider,et al.  Présidente: le grand défi. Femmes, politique et médias , 2012 .

[56]  Vasudeva Varma,et al.  Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations , 2019, WWW.

[57]  Jeff M. Phillips,et al.  Attenuating Bias in Word Vectors , 2019, AISTATS.