MT-GAN-BERT: Multi-Task and Generative Adversarial Learning for Sustainable Language Processing

In this paper, we present MT-GAN-BERT, i.e., a BERT-based architecture for faceted classification tasks. It aims to reduce the requirements of Transformers both in terms of the amount of annotated data and the computational cost required at classification time. First, MT-GAN-BERT enables semi-supervised learning in BERT-based architectures based on Generative Adversarial Learning. Second, it implements a Multi-task Learning approach to solve multiple tasks simultaneously. A single BERTbased model is used to encode the input examples, while multiple linear layers are used to implement the classification steps, with a significant reduction of the computational costs. Experimental evaluations against six classification tasks involved in detecting abusive languages in Italian suggest that MT-GAN-BERT represents a sustainable solution that generally improves the raw adoption of multiple BERT-based models with lighter requirements in terms of annotated data and computational costs.

[1]  Kevin Gimpel,et al.  Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.

[2]  Malvina Nissim,et al.  Overview of the Evalita 2016 SENTIment POLarity Classification Task , 2014, CLiC-it/EVALITA.

[3]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[4]  Helen Yannakoudakis,et al.  Abusive Language Detection with Graph Convolutional Networks , 2019, NAACL.

[5]  Athena Vakali,et al.  A Unified Deep Learning Architecture for Abuse Detection , 2018, WebSci.

[6]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[7]  Gianluca E. Lebani,et al.  DANKMEMES @ EVALITA 2020: The Memeing of Life: Memes, Multimodality and Politics , 2020, EVALITA.

[8]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[9]  Helen Yannakoudakis,et al.  Tackling Online Abuse: A Survey of Automated Abuse Detection Methods , 2019, ArXiv.

[10]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[11]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[12]  Yoav Shoham,et al.  The Cost of Training NLP Models: A Concise Overview , 2020, ArXiv.

[13]  Paolo Rosso,et al.  Overview of the Evalita 2018 Task on Automatic Misogyny Identification (AMI) , 2018, EVALITA@CLiC-it.

[14]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[15]  Helen Yannakoudakis,et al.  Joint Modelling of Emotion and Abusive Language Detection , 2020, ACL.

[16]  Fedor Moiseev,et al.  Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.

[17]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[18]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[19]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Michael W. Mahoney,et al.  Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT , 2019, AAAI.

[22]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[23]  Cristina Bosco,et al.  Hate Speech Annotation: Analysis of an Italian Twitter Corpus , 2017, CLiC-it.

[24]  Leon Derczynski,et al.  Directions in abusive language training data, a systematic review: Garbage in, garbage out , 2020, PloS one.

[25]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[26]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[27]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[28]  Cristina Bosco,et al.  An Italian Twitter Corpus of Hate Speech against Immigrants , 2018, LREC.

[29]  Roberto Basili,et al.  GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples , 2020, ACL.

[30]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..