LTL-UDE at SemEval-2019 Task 6: BERT and Two-Vote Classification for Categorizing Offensiveness

We present results for Subtask A and C of SemEval 2019 Shared Task 6. In Subtask A, we experiment with an embedding representation of postings and use BERT to categorize postings. Our best result reaches the 10th place (out of 103). In Subtask C, we applied a two-vote classification approach with minority fallback, which is placed on the 19th rank (out of 65).

[1]  Julia Hirschberg,et al.  Detecting Hate Speech on the World Wide Web , 2012 .

[2]  Michael Wiegand,et al.  Inducing a Lexicon of Abusive Words – a Feature-Based Approach , 2018, NAACL.

[3]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Ritesh Kumar,et al.  Benchmarking Aggression Identification in Social Media , 2018, TRAC@COLING 2018.

[6]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[7]  Michael Wiegand,et al.  Overview of the GermEval 2018 Shared Task on the Identification of Offensive Language , 2018 .

[8]  Torsten Zesch,et al.  Do Women Perceive Hate Differently: Examining the Relationship Between Hate Speech, Gender, and Agreement Judgments , 2018, KONVENS.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Preslav Nakov,et al.  Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[11]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[12]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[13]  Carolyn Penstein Rosé,et al.  Detecting offensive tweets via topical feature discovery over a large scale twitter corpus , 2012, CIKM.

[14]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Viviana Patti,et al.  Hurtlex: A Multilingual Lexicon of Words to Hurt , 2018, CLiC-it.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Preslav Nakov,et al.  SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval) , 2019, *SEMEVAL.

[20]  Ingmar Weber,et al.  Understanding Abuse: A Typology of Abusive Language Detection Subtasks , 2017, ALW@ACL.

[21]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[22]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[23]  Henry Lieberman,et al.  Script-based story matching for cyberbullying prevention , 2013, CHI Extended Abstracts.