Tom Jumbo-Grumbo at SemEval-2019 Task 4: Hyperpartisan News Detection with GloVe vectors and SVM

In this paper, we describe our attempt to learn bias from news articles. From our experiments, it seems that although there is a correlation between publisher bias and article bias, it is challenging to learn bias directly from the publisher labels. On the other hand, using few manually-labeled samples can increase the accuracy metric from around 60% to near 80%. Our system is computationally inexpensive and uses several standard document representations in NLP to train an SVM or LR classifier. The system ranked 4th in the SemEval-2019 task. The code is released for reproducibility.

[1]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[2]  Steven Skiena,et al.  Multi-view Models for Political Ideology Detection of News Articles , 2018, EMNLP.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[5]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[6]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[7]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[8]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[9]  Mattias Polborn,et al.  Political Polarization and the Electoral Effects of Media Bias , 2006, SSRN Electronic Journal.

[10]  Benno Stein,et al.  A Stylometric Inquiry into Hyperpartisan and Fake News , 2017, ACL.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[13]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[14]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[15]  Daniel Jurafsky,et al.  Linguistic Models for Analyzing and Detecting Biased Language , 2013, ACL.

[16]  Gerhard Weikum,et al.  Leveraging Joint Interactions for Credibility Analysis in News Communities , 2015, CIKM.

[17]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .