Automatic detection of sexist content in memes

Online social media platforms and websites have become crucial in our society as instruments to define our identity and our relationships through the content we consume. Social issues such as sexism1 are transmitted and spread online through offensive images and texts conveying several forms of hate against women. Therefore, automatic detection of multimedia sexist content is mandatory and thus we focus on memes. To this end, we have collected and validated a dataset of 800 memes (MIME dataset) and we propose a multimodal classifier which, combining visual and textual features, is able to automatically detect sexist content. Few works are dedicated to the automatic detection of offensive content, using only one type of media. A study [4] was conducted for Youtube in order to detect violence in videos using three different types of media (audio, video and text). Dinakar & al. [2] constructed a corpus of Youtube comments on sensitive topics such as race and tried to classify them using a bag of words driven text classification. The first attempt to explore the field of automatic detection of sexist multimedia content was performed in 2018 by Gasparini et al. [3] considering advertisements. In the same year, Anzovino et al. [1] studied the detection and classification of misogynistic text collected from Twitter. Figure 1: Pipeline for sexist automatic detection. Our work (Figure 1) started by collecting 800 sexist and nonsexist English memes for the MIME dataset by searching through 89 different sources, including social media, websites and forums, trying to balance the two categories. In order to validate the dataset, we realized a questionnaire on the Figure Eight (https://www.figure-eight. com/) platform on the 800 collected memes. We involved 60 participants: 30 males and 30 females, distributed evenly in three ranges ∗All authors contributed equally to this research. 1Sexism is an ideology based on discrimination on the basis of gender, often directed to women. according to age: 20 people between 21–30, 20 people between 31–40 and 20 people between 41–50, obtaining 3 judgements for each meme. After showing a meme at a time, the questionnaire asked the participants if: i) the meme is sexist and in case of a positive answer which was the media that carries the sexist content (text, image, both); ii) the meme is aggressive; and iii) the meme is ironic. Here we focus only on the sexist validation. The results (Table 1) confirm that our starting database is balanced for the division of sexist and non-sexist memes and also, that the judgements were mainly based on text and the union of text and image contents. The image itself was rarely considered sexist by itself. Table 1: Media involved in the sexist content identification.