FraunhoferSIT at GermEval 2019: Can Machines Distinguish Between Offensive Language and Hate Speech? Towards a Fine-Grained Classification

In this paper, we describe the FraunhoferSIT submission for the “GermEval 2019 – Shared Task on the Identification of Offensive Language”. We participated in two subtasks: task 1 is a binary classification of German tweets on the identification of offensive language. Task 2 is a fine-grained classification to distinguish between three subcategories of offensive language. Our best model is an SVM classifier based on tfidf character n-gram features. Our submitted runs in the shared task are: FraunhoferSIT coarse [1-3].txt for task 1 and FraunhoferSIT fine [1-3].txt for task 2. Our final system reaches 0.70 macro-average F1score for the binary classification and 0.46 F1-score for the fine-grained classification. The achieved results show that the problem of automatically distinguishing between offensive language and “Hate Speech” is far from being solved.