论文信息 - A Retrospective Analysis of the Fake News Challenge Stance-Detection Task - 字舞流文

A Retrospective Analysis of the Fake News Challenge Stance-Detection Task

The 2017 Fake News Challenge Stage 1 (FNC-1) shared task addressed a stance classification task as a crucial first step towards detecting fake news. To date, there is no in-depth analysis paper to critically discuss FNC-1’s experimental setup, reproduce the results, and draw conclusions for next-generation stance classification methods. In this paper, we provide such an in-depth analysis for the three top-performing systems. We first find that FNC-1’s proposed evaluation metric favors the majority class, which can be easily classified, and thus overestimates the true discriminative power of the methods. Therefore, we propose a new F1-based metric yielding a changed system ranking. Next, we compare the features and architectures used, which leads to a novel feature-rich stacked LSTM model that performs on par with the best systems, but is superior in predicting minority classes. To understand the methods’ ability to generalize, we derive a new dataset and perform both in-domain and cross-domain experiments. Our qualitative and quantitative study helps interpreting the original FNC-1 scores and understand which features help improving performance and why. Our new dataset and all source code used during the reproduction study are publicly available for future research.

Iryna Gurevych | Andreas Hanselowski | Christian M. Meyer | Debanjan Chaudhuri | S. AvineshP.V. | Benjamin Schiller | Felix Caspelherr | Iryna Gurevych | Benjamin Schiller | Andreas Hanselowski | S. AvineshP.V. | Debanjan Chaudhuri | Felix Caspelherr

[1] Andreas Vlachos,et al. Emergent: a novel data-set for stance classification , 2016, NAACL.

[2] R. Mitkov,et al. What can readability measures really tell us about text complexity , 2012 .

[3] Tejashri Inadarchand Jain,et al. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2010 .

[4] Arkaitz Zubiaga,et al. Hawkes Processes for Continuous Time Sequence Classification: an Application to Rumour Stance Classification in Twitter , 2016, ACL.

[5] Suhang Wang,et al. Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[6] Peter D. Turney,et al. Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[7] Saif Mohammad,et al. NRC-Canada-2014: Recent Improvements in the Sentiment Analysis of Tweets , 2014, SemEval@COLING.

[8] M. Coleman,et al. A computer readability formula designed for machine scoring. , 1975 .

[9] Saif Mohammad,et al. SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[10] R. P. Fishburne,et al. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[11] Dirk Hovy,et al. Learning Whom to Trust with MACE , 2013, NAACL.

[12] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[13] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[14] B. Morton. Fake news. , 2018, Marine pollution bulletin.

[15] Isabelle Augenstein,et al. A simple but tough-to-beat baseline for the Fake News Challenge stance detection task , 2017, ArXiv.

[16] Sibel Adali,et al. This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News , 2017, Proceedings of the International AAAI Conference on Web and Social Media.

[17] Marilyn A. Walker,et al. Stance Classification using Dialogic Properties of Persuasion , 2012, NAACL.

[18] Iryna Gurevych,et al. Parsing Argumentation Structures in Persuasive Essays , 2016, CL.

[19] James R. Foulds,et al. Joint Models of Disagreement and Stance in Online Debate , 2015, ACL.

[20] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .

[21] Preslav Nakov,et al. SemEval-2015 Task 10: Sentiment Analysis in Twitter , 2015, *SEMEVAL.

[22] Arkaitz Zubiaga,et al. SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours , 2017, *SEMEVAL.

[23] Ruifeng Xu,et al. Stance Classification with Target-specific Neural Attention , 2017, IJCAI.

[24] Swapna Somasundaran,et al. Recognizing Stances in Ideological On-Line Debates , 2010, HLT-NAACL 2010.

[25] Chih-Jen Lin,et al. Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[26] Paolo Rosso,et al. Overview of the Task on Stance and Gender Detection in Tweets on Catalan Independence , 2017, IberEval@SEPLN.

[27] Ron Artstein,et al. Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[28] Saif Mohammad,et al. NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets , 2013, *SEMEVAL.

[29] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[30] Vincent Ng,et al. Stance Classification of Ideological Debates: Data, Models, Features, and Constraints , 2013, IJCNLP.

[31] E A Smith,et al. Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[32] Georg Rehm,et al. From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles , 2017, NLPmJ@EMNLP.

[33] Saif Mohammad,et al. Sentiment Analysis of Short Informal Texts , 2014, J. Artif. Intell. Res..

[34] Richard Davis. Fake News , Real Consequences : Recruiting Neural Networks for the Fight Against Fake News , 2017 .

[35] Mike Y. Chen,et al. Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[36] Chris Taylor,et al. Emergent: a real-time rumor tracker , 2016 .

[37] Jonathan Anderson. Lix and Rix: Variations on a Little-Known Readability Index. , 1983 .

[38] Andreas Vlachos,et al. Fake news stance detection using stacked ensemble of classifiers , 2017, NLPmJ@EMNLP.

[39] Benjamin Schrauwen,et al. Training and Analysing Deep Recurrent Neural Networks , 2013, NIPS.

[40] Benno Stein,et al. The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants , 2017, NAACL.

[41] Guido Zarrella,et al. MITRE at SemEval-2016 Task 6: Transfer Learning for Stance Detection , 2016, *SEMEVAL.

[42] Kalina Bontcheva,et al. Stance Detection with Bidirectional Conditional Encoding , 2016, EMNLP.

[43] G. Harry McLaughlin,et al. SMOG Grading - A New Readability Formula. , 1969 .

[44] Saif Mohammad,et al. CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..