Saagie at Semeval-2019 Task 5: From Universal Text Embeddings and Classical Features to Domain-specific Text Classification

This paper describes our contribution to SemEval 2019 Task 5: Hateval. We propose to investigate how domain-specific text classification task can benefit from pretrained state of the art language models and how they can be combined with classical handcrafted features. For this purpose, we propose an approach based on a feature-level Meta-Embedding to let the model choose which features to keep and how to use them.

[1]  Mathieu Cliche,et al.  BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs , 2017, *SEMEVAL.

[2]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[3]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[4]  Aurélien Lucchi,et al.  SwissCheese at SemEval-2016 Task 4: Sentiment Classification Using an Ensemble of Convolutional Neural Networks with Distant Supervision , 2016, *SEMEVAL.

[5]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[6]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[7]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Daniela Moctezuma,et al.  INGEOTEC at MEX-A3T: Author Profiling and Aggressiveness Analysis in Twitter Using μTC and EvoMSA , 2018, IberEval@SEPLN.

[10]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[11]  Viviana Patti,et al.  14-ExLab@UniTo for AMI at IberEval2018: Exploiting Lexical Knowledge for Detecting Misogyny in English and Spanish Tweets , 2018, IberEval@SEPLN.

[12]  Sebastian Ruder,et al.  Fine-tuned Language Models for Text Classification , 2018, ArXiv.

[13]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[14]  Paolo Rosso,et al.  Attention Mechanism for Aggressive Detection , 2018, IberEval@SEPLN.

[15]  Kyunghyun Cho,et al.  Dynamic Meta-Embeddings for Improved Sentence Representations , 2018, EMNLP.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[18]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[20]  Paolo Rosso,et al.  SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[21]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[22]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.