MultiAzterTest@Exist-IberLEF 2021: Linguistically Motivated Sexism Identification

Identifying sexism in social networks is the focus of the EXISTIberLEF 2021 shared task. By participating in this task, the aim of the MultiAzterTest team is to see if linguistically motivated features can help in the detection of sexism. That is why, we present the three approaches: i) an approach based on language models, ii) an approach based on linguistic and stylistic features + machine learning classifiers and iii) an approach combining the previous approaches. The language model approach obtains the best results in the test data. However, the approaches that use linguistic and stylistic features offer more interpretability.

[1]  Viviana Patti,et al.  Misogyny Detection in Twitter: a Multilingual and Cross-Domain Study , 2020, Inf. Process. Manag..

[2]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[3]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[4]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Viviana Patti,et al.  Resources and benchmark corpora for hate speech detection: a systematic review , 2020, Language Resources and Evaluation.

[6]  Itziar Gonzalez-Dios,et al.  MultiAzterTest: a Multilingual Analyzer on Multiple Levels of Language for Readability Assessment , 2021, ArXiv.

[7]  Anuja Arora,et al.  Linguistic feature based learning model for fake news detection and classification , 2021, Expert Syst. Appl..

[8]  Tommaso Caselli,et al.  I Feel Offended, Don’t Be Abusive! Implicit/Explicit Messages in Offensive and Abusive Language , 2020, LREC.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Christopher D. Manning,et al.  Stanza: A Python Natural Language Processing Toolkit for Many Human Languages , 2020, ACL.

[12]  Julio Gonzalo,et al.  Overview of EXIST 2021: sEXism Identification in Social neTworks , 2021, Proces. del Leng. Natural.

[13]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[14]  Arantza Díaz de Ilarraza,et al.  Simple or Complex? Assessing the readability of Basque Texts , 2014, COLING.

[15]  Michael Wiegand,et al.  Inducing a Lexicon of Abusive Words – a Feature-Based Approach , 2018, NAACL.

[16]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[17]  Zeerak Waseem,et al.  Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter , 2016, NLP+CSS@EMNLP.

[18]  Laura Plaza,et al.  Automatic Classification of Sexism in Social Networks: An Empirical Study on Twitter Data , 2020, IEEE Access.

[19]  Paula Fortuna,et al.  Toxic, Hateful, Offensive or Abusive? What Are We Really Classifying? An Empirical Analysis of Hate Speech Datasets , 2020, LREC.

[20]  Petra Kralj Novak,et al.  Sentiment of Emojis , 2015, PloS one.

[21]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[22]  Viviana Patti,et al.  Hurtlex: A Multilingual Lexicon of Words to Hurt , 2018, CLiC-it.

[23]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Sima Sharifirad,et al.  Learning and Understanding Different Categories of Sexism Using Convolutional Neural Network’s Filters , 2019, WNLP@ACL.

[26]  Elisabetta Fersini,et al.  Profiling Italian Misogynist: An Empirical Study , 2020, ResTUP@LREC.

[27]  Itziar Gonzalez-Dios,et al.  AzterTest: Open source linguistic and stylistic analysis tool , 2020, Proces. del Leng. Natural.