Cost-Sensitive BERT for Generalisable Sentence Classification on Imbalanced Data

The automatic identification of propaganda has gained significance in recent years due to technological and social changes in the way news is generated and consumed. That this task can be addressed effectively using BERT, a powerful new architecture which can be fine-tuned for text classification tasks, is not surprising. However, propaganda detection, like other tasks that deal with news documents and other forms of decontextualized social communication (e.g. sentiment analysis), inherently deals with data whose categories are simultaneously imbalanced and dissimilar. We show that BERT, while capable of handling imbalanced classes with no additional data augmentation, does not generalise well when the training and test data are sufficiently dissimilar (as is often the case with news sources, whose topics evolve over time). We show how to address this problem by providing a statistical measure of similarity between datasets and a method of incorporating cost-weighting into BERT when the training and test sets are dissimilar. We test these methods on the Propaganda Techniques Corpus (PTC) and achieve the second-highest score on sentence-level propaganda classification.

[1]  Igor Kononenko,et al.  Cost-Sensitive Learning with Neural Networks , 1998, ECAI.

[2]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[3]  Eunsol Choi,et al.  Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.

[4]  R. Jakobson Closing Statement: Linguistics and Poetics , 2006 .

[5]  Jonathan Auerbach,et al.  The Oxford handbook of propaganda studies , 2013 .

[6]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[7]  Tatiana Nikitina,et al.  Linguistic Typology , 2019 .

[8]  Anton A. Emelyanov,et al.  Multilingual Named Entity Recognition Using Pretrained Embeddings, Attention Mechanism and NCRF , 2019, BSNLP@ACL.

[9]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[10]  Victor S. Sheng,et al.  Cost-Sensitive Learning and the Class Imbalance Problem , 2008 .

[11]  Robert Jackall How to Detect Propaganda , 1995 .

[12]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[13]  Preslav Nakov,et al.  Proppy: A System to Unmask Propaganda in Online News , 2019, AAAI.

[14]  N. O'shaughnessy,et al.  Politics and Propaganda: Weapons of Mass Seduction , 2004 .

[15]  Christof Monz,et al.  Data Augmentation for Low-Resource Neural Machine Translation , 2017, ACL.

[16]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[17]  Marie-Francine Moens,et al.  Model-Portability Experiments for Textual Temporal Analysis , 2011, ACL.

[18]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[19]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[20]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[21]  E. Bernays,et al.  Propaganda , 1948, Trennung und Angst.

[22]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[23]  Preslav Nakov,et al.  Fine-Grained Analysis of Propaganda in News Article , 2019, EMNLP.

[24]  A. Kilgarriff Comparing Corpora , 2001 .

[25]  Daniel Jurafsky,et al.  Data Noising as Smoothing in Neural Network Language Models , 2017, ICLR.

[26]  Paul Magee,et al.  Marxism and the Philosophy of Language , 2013 .

[27]  Diyi Yang,et al.  That’s So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets , 2015, EMNLP.