论文信息 - Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection - 字舞流文

Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection

In this paper, we describe our submission to SemEval-2019 Task 4 on Hyperpartisan News Detection. Our system relies on a variety of engineered features originally used to detect propaganda. This is based on the assumption that biased messages are propagandistic in the sense that they promote a particular political cause or viewpoint. We trained a logistic regression model with features ranging from simple bag-of-words to vocabulary richness and text readability features. Our system achieved 72.9% accuracy on the test data that is annotated manually and 60.8% on the test data that is annotated with distant supervision. Additional experiments showed that significant performance improvements can be achieved with better feature pre-processing.

Preslav Nakov | Alberto Barrón-Cedeño | James R. Glass | Giovanni Da San Martino | Ramy Baly | Abdelrhman Saleh | Mitra Mohtarami

[1] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[2] J. Hooper. On Assertive Predicates , 1975 .

[3] Sinan Aral,et al. The spread of true and false news online , 2018, Science.

[4] Gerhard Weikum,et al. Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media , 2017, WWW.

[5] Preslav Nakov,et al. Predicting Factuality of Reporting and Bias of News Media Sources , 2018, EMNLP.

[6] J. Ellul. Propaganda: The Formation of Men's Attitudes , 1965 .

[7] Efstathios Stamatatos,et al. A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[8] R. Gunning. The Technique of Clear Writing. , 1968 .

[9] Helen Treadwell,et al. The International Encyclopedia of Language and Social Interaction , 2016 .

[10] R. P. Fishburne,et al. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[11] Christopher D. Manning,et al. Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[12] James W. Pennebaker,et al. Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[13] Preslav Nakov,et al. Automatic Stance Detection Using End-to-End Memory Networks , 2018, NAACL.

[14] Benno Stein,et al. A Stylometric Inquiry into Hyperpartisan and Fake News , 2017, ACL.

[15] G. Yule,et al. The statistical study of literary vocabulary , 1944 .

[16] Sibel Adali,et al. This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News , 2017, Proceedings of the International AAAI Conference on Web and Social Media.

[17] Iryna Gurevych,et al. A Retrospective Analysis of the Fake News Challenge Stance-Detection Task , 2018, COLING.

[18] Barbara Poblete,et al. Information credibility on twitter , 2011, WWW.

[19] Andreas Vlachos,et al. Automated Fact Checking: Task Formulations, Methods and Future Directions , 2018, COLING.

[20] Janyce Wiebe,et al. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[21] Preslav Nakov,et al. The dark side of news community forums: opinion manipulation trolls , 2018, Internet Res..

[22] Preslav Nakov,et al. Integrating Stance Detection and Fact Checking in a Unified Corpus , 2018, NAACL.

[23] Preslav Nakov,et al. Automatic Fact-Checking Using Context and Discourse Information , 2019, ACM J. Data Inf. Qual..

[24] Moshe Koppel,et al. Measuring Differentiability: Unmasking Pseudonymous Authors , 2007, J. Mach. Learn. Res..

[25] Preslav Nakov,et al. Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media , 2019, NAACL.

[26] Miriam J. Metzger,et al. The science of fake news , 2018, Science.

[27] Peter D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[28] Benno Stein,et al. SemEval-2019 Task 4: Hyperpartisan News Detection , 2019, *SEMEVAL.

[29] Preslav Nakov,et al. Proppy: A System to Unmask Propaganda in Online News , 2019, AAAI.

[30] Steven Skiena,et al. Multi-view Models for Political Ideology Detection of News Articles , 2018, EMNLP.

[31] Ann M. Brill,et al. Online Journalists Embrace New Marketing Function , 2001 .

[32] Eunsol Choi,et al. Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.