INGEOTEC at SemEval-2020 Task 12: Multilingual Classification of Offensive Text

This paper describes our participation in OffensEval challenges for English, Arabic, Danish, Turkish, and Greek languages. We used several approaches, such as μTC, TextCategorization, and EvoMSA. Best results were achieved with EvoMSA, which is a multilingual and domainindependent architecture that combines the prediction from different knowledge sources to solve text classification problems.

[1]  Daniela Moctezuma,et al.  A Simple Approach to Multilingual Polarity Classification in Twitter , 2016, Pattern Recognit. Lett..

[2]  Hugo Jair Escalante,et al.  Overview of MEX-A3T at IberLEF 2019: Authorship and Aggressiveness Analysis in Mexican Spanish Tweets , 2018, IberLEF@SEPLN.

[3]  Marcos Zampieri,et al.  Offensive Language Identification in Greek , 2020, LREC.

[4]  Ted Pedersen,et al.  Duluth at SemEval-2019 Task 6: Lexical Approaches to Identify and Categorize Offensive Tweets , 2019, *SEMEVAL.

[5]  Leon Derczynski,et al.  Offensive Language and Hate Speech Detection for Danish , 2019, LREC.

[6]  Preslav Nakov,et al.  SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) , 2020, SemEval@COLING.

[7]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[8]  Liang Zou,et al.  NULI at SemEval-2019 Task 6: Transfer Learning for Offensive Language Detection using Bidirectional Transformers , 2019, *SEMEVAL.

[9]  Ingmar Weber,et al.  Understanding Abuse: A Typology of Abusive Language Detection Subtasks , 2017, ALW@ACL.

[10]  Daniela Moctezuma,et al.  An Automated Text Categorization Framework based on Hyperparameter Optimization , 2017, Knowl. Based Syst..

[11]  Çağrı Çöltekin,et al.  A Corpus of Turkish Offensive Language on Social Media , 2020, LREC.

[12]  Juan-Manuel Torres-Moreno,et al.  Cyberbullying Detection Task: the EBSI-LIA-UNAM System (ELU) at COLING’18 TRAC-1 , 2018, TRAC@COLING 2018.

[13]  Azadeh Shakery,et al.  Emad at SemEval-2019 Task 6: Offensive Language Identification using Traditional Machine Learning and Deep Learning approaches , 2019, SemEval@NAACL-HLT.

[14]  Preslav Nakov,et al.  Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[15]  Peter K. Smith,et al.  Cyberbullying: its nature and impact in secondary school pupils. , 2008, Journal of child psychology and psychiatry, and allied disciplines.

[16]  Preslav Nakov,et al.  A Large-Scale Semi-Supervised Dataset for Offensive Language Identification , 2020, ArXiv.

[17]  Daniela Moctezuma,et al.  EvoMSA: A Multilingual Evolutionary Approach for Sentiment Analysis , 2018, IEEE Comput. Intell. Mag..

[18]  Hugo Jair Escalante,et al.  EvoDAG: A semantic Genetic Programming Python library , 2016, 2016 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC).

[19]  Alex Nikolov,et al.  Nikolov-Radivchev at SemEval-2019 Task 6: Offensive Tweet Classification with BERT and Ensembles , 2019, *SEMEVAL.

[20]  Preslav Nakov,et al.  SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval) , 2019, *SEMEVAL.

[21]  Paolo Rosso,et al.  SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[22]  Ritesh Kumar,et al.  Benchmarking Aggression Identification in Social Media , 2018, TRAC@COLING 2018.

[23]  Daniela Moctezuma,et al.  A case study of Spanish text transformations for twitter sentiment analysis , 2017, Expert Syst. Appl..

[24]  Ahmed Abdelali,et al.  Arabic Offensive Language on Twitter: Analysis and Experiments , 2020, ArXiv.