CyberWallE at SemEval-2020 Task 11: An Analysis of Feature Engineering for Ensemble Models for Propaganda Detection

This paper describes our participation in the SemEval-2020 task Detection of Propaganda Techniques in News Articles. We participate in both subtasks: Span Identification (SI) and Technique Classification (TC). We use a bi-LSTM architecture in the SI subtask and train a complex ensemble model for the TC subtask. Our architectures are built using embeddings from BERT in combination with additional lexical features and extensive label post-processing. Our systems achieve a rank of 8 out of 35 teams in the SI subtask (F1-score: 43.86%) and 8 out of 31 teams in the TC subtask (F1-score: 57.37%).

[1]  Preslav Nakov,et al.  Findings of the NLP4IF-2019 Shared Task on Fine-Grained Propaganda Detection , 2019, EMNLP.

[2]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[3]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[4]  Thomas A. Runkler,et al.  Neural Architectures for Fine-Grained Propaganda Detection in News , 2019, EMNLP.

[5]  Eunsol Choi,et al.  Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[8]  Preslav Nakov,et al.  Fine-Grained Analysis of Propaganda in News Article , 2019, EMNLP.

[9]  Sumeet Dua,et al.  Divisive Language and Propaganda Detection using Multi-head Attention Transformers with Deep Learning BERT-based Language Models for Binary Classification , 2019, EMNLP.

[10]  Yin Yang,et al.  Fine-Grained Propaganda Detection with Fine-Tuned BERT , 2019, EMNLP.

[11]  Swapna Somasundaran,et al.  Detecting Arguing and Sentiment in Meetings , 2007, SIGdial.

[12]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[13]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[14]  Smaranda Muresan,et al.  Fine-Tuned Neural Models for Propaganda Detection at the Sentence and Fragment levels , 2019, EMNLP.

[15]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[16]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[17]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Jinfen Li,et al.  Detection of Propaganda Using Logistic Regression , 2019, EMNLP.

[20]  André Ferreira Cruz,et al.  On Sentence Representations for Propaganda Detection: From Handcrafted Features to Word Embeddings , 2019, EMNLP.

[21]  Samira Shaikh,et al.  JUSTDeep at NLP4IF 2019 Task 1: Propaganda Detection using Ensemble Deep Learning Models , 2019, EMNLP.

[22]  Preslav Nakov,et al.  Proppy: A System to Unmask Propaganda in Online News , 2019, AAAI.

[23]  Giovanni Da San Martino,et al.  SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles , 2020, SEMEVAL.