论文信息 - DeepBlueAI at SemEval-2021 Task 7: Detecting and Rating Humor and Offense with Stacking Diverse Language Model-Based Methods

DeepBlueAI at SemEval-2021 Task 7: Detecting and Rating Humor and Offense with Stacking Diverse Language Model-Based Methods

This paper describes the winning system for SemEval-2021 Task 7: Detecting and Rating Humor and Offense. Our strategy is stacking diverse pre-trained language models (PLMs) such as RoBERTa and ALBERT. We first perform fine-tuning on these two PLMs with various hyperparameters and different training strategies. Then a valid stacking mechanism is applied on top of the fine-tuned PLMs to get the final prediction. Experimental results on the dataset released by the organizer of the task show the validity of our method and we win first place and third place for subtask 2 and 1a.

[1] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2] Preslav Nakov,et al. SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval) , 2019, *SEMEVAL.

[3] Michael Gamon,et al. “President Vows to Cut Hair”: Dataset and Analysis of Creative Text Editing for Humorous Headlines , 2019, NAACL.

[4] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[5] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[6] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[7] Hiroshi Inoue,et al. Multi-Sample Dropout for Accelerated Training and Better Generalization , 2019, ArXiv.

[8] Henry A. Kautz,et al. SemEval-2020 Task 7: Assessing Humor in Edited News Headlines , 2020, SEMEVAL.

[9] Walid Magdy,et al. SemEval 2021 Task 7: HaHackathon, Detecting and Rating Humor and Offense , 2021, SEMEVAL.

[10] Andrew M. Dai,et al. Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.