论文信息 - GermEval 2018 : German Abusive Tweet Detection

GermEval 2018 : German Abusive Tweet Detection

The TUWienKBS system for abusive tweet detection in the GermEval 2018 competition is a stacked classifier. Five disjoint sets of features are used: token and character n-grams, relatedness to the, according to TFIDF, most important tokens and character n-grams within each class, and the average of the embedding vectors of all tokens in a tweet. Three base classifiers (maximum entropy and two random forest ensembles) are trained independently on each of these features, which yields 15 predictions for the type and/or level of abusiveness of the given tweets. One maximum entropy meta-level classifier performs the final classification. As word embedding fallback for out-of-vocabulary tokens we use the embeddings of the largest prefix and suffix of the token, if such embeddings can be found.

Joaquín Padilla Montani | Joaquı́n Padilla Montani

[1] Iryna Gurevych,et al. EELECTION at SemEval-2017 Task 10: Ensemble of nEural Learners for kEyphrase ClassificaTION , 2017, *SEMEVAL.

[2] Petr Sojka,et al. Software Framework for Topic Modelling with Large Corpora , 2010 .

[3] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.

[4] Fernando Nogueira,et al. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[5] David Robinson,et al. Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network , 2018, ESWC.

[6] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7] Ewan Klein,et al. Natural Language Processing with Python , 2009 .

[8] Iryna Gurevych,et al. UKP TU-DA at GermEval 2017: Deep Learning for Aspect Based Sentiment Detection , 2017 .

[9] Vasudeva Varma,et al. Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[10] Josef Ruppenhofer,et al. Guidelines for IGGSA Shared Task on the Identification of Offensive Language , 2018 .