Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection

In this paper, we describe our submission to SemEval-2019 Task 4 on Hyperpartisan News Detection. Our system relies on a variety of engineered features originally used to detect propaganda. This is based on the assumption that biased messages are propagandistic in the sense that they promote a particular political cause or viewpoint. We trained a logistic regression model with features ranging from simple bag-of-words to vocabulary richness and text readability features. Our system achieved 72.9% accuracy on the test data that is annotated manually and 60.8% on the test data that is annotated with distant supervision. Additional experiments showed that significant performance improvements can be achieved with better feature pre-processing.

[1]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[2]  J. Hooper On Assertive Predicates , 1975 .

[3]  Sinan Aral,et al.  The spread of true and false news online , 2018, Science.

[4]  Gerhard Weikum,et al.  Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media , 2017, WWW.

[5]  Preslav Nakov,et al.  Predicting Factuality of Reporting and Bias of News Media Sources , 2018, EMNLP.

[6]  J. Ellul Propaganda: The Formation of Men's Attitudes , 1965 .

[7]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[8]  R. Gunning The Technique of Clear Writing. , 1968 .

[9]  Helen Treadwell,et al.  The International Encyclopedia of Language and Social Interaction , 2016 .

[10]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[11]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[12]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[13]  Preslav Nakov,et al.  Automatic Stance Detection Using End-to-End Memory Networks , 2018, NAACL.

[14]  Benno Stein,et al.  A Stylometric Inquiry into Hyperpartisan and Fake News , 2017, ACL.

[15]  G. Yule,et al.  The statistical study of literary vocabulary , 1944 .

[16]  Sibel Adali,et al.  This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News , 2017, Proceedings of the International AAAI Conference on Web and Social Media.

[17]  Iryna Gurevych,et al.  A Retrospective Analysis of the Fake News Challenge Stance-Detection Task , 2018, COLING.

[18]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[19]  Andreas Vlachos,et al.  Automated Fact Checking: Task Formulations, Methods and Future Directions , 2018, COLING.

[20]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[21]  Preslav Nakov,et al.  The dark side of news community forums: opinion manipulation trolls , 2018, Internet Res..

[22]  Preslav Nakov,et al.  Integrating Stance Detection and Fact Checking in a Unified Corpus , 2018, NAACL.

[23]  Preslav Nakov,et al.  Automatic Fact-Checking Using Context and Discourse Information , 2019, ACM J. Data Inf. Qual..

[24]  Moshe Koppel,et al.  Measuring Differentiability: Unmasking Pseudonymous Authors , 2007, J. Mach. Learn. Res..

[25]  Preslav Nakov,et al.  Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media , 2019, NAACL.

[26]  Miriam J. Metzger,et al.  The science of fake news , 2018, Science.

[27]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[28]  Benno Stein,et al.  SemEval-2019 Task 4: Hyperpartisan News Detection , 2019, *SEMEVAL.

[29]  Preslav Nakov,et al.  Proppy: A System to Unmask Propaganda in Online News , 2019, AAAI.

[30]  Steven Skiena,et al.  Multi-view Models for Political Ideology Detection of News Articles , 2018, EMNLP.

[31]  Ann M. Brill,et al.  Online Journalists Embrace New Marketing Function , 2001 .

[32]  Eunsol Choi,et al.  Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.