Article Reranking by Memory-Enhanced Key Sentence Matching for Detecting Previously Fact-Checked Claims

False claims that have been previously factchecked can still spread on social media. To mitigate their continual spread, detecting previously fact-checked claims is indispensable. Given a claim, existing works retrieve fact-checking articles (FC-articles) for detection and focus on reranking candidate articles in the typical two-stage retrieval framework. However, their performance may be limited as they ignore the following characteristics of FC-articles: (1) claims are often quoted to describe the checked events, providing lexical information besides semantics; and (2) sentence templates to introduce or debunk claims are common across articles, providing pattern information. In this paper, we propose a novel reranker, MTM (Memoryenhanced Transformers for Matching), to rank FC-articles using key sentences selected using event (lexical and semantic) and pattern information. For event information, we propose to finetune the Transformer with regression of ROUGE. For pattern information, we generate pattern vectors as a memory bank to match with the parts containing patterns. By fusing event and pattern information, we select key sentences to represent an article and then predict if the article fact-checks the given claim using the claim, key sentences, and patterns. Experiments on two real-world datasets show that MTM outperforms existing methods. Human evaluation proves that MTM can capture key sentences for explanations. The code and the dataset are at https://github.com/ ICTMCG/MTM.

[1]  Jimmy J. Lin,et al.  Simple Applications of BERT for Ad Hoc Document Retrieval , 2019, ArXiv.

[2]  Oluwaseun Ajao,et al.  Sentiment Aware Fake News Detection on Online Social Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  David C. Parkes,et al.  A Kernel of Truth: Determining Rumor Veracity on Twitter by Diffusion Pattern Alone , 2020, WWW.

[4]  Luyu Gao,et al.  Complementing Lexical Retrieval with Semantic Residual Embedding , 2020, ArXiv.

[5]  Jintao Li,et al.  Exploiting Multi-domain Visual Information for Fake News Detection , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[6]  Huan Liu,et al.  dEFEND: Explainable Fake News Detection , 2019, KDD.

[7]  Alberto Del Bimbo,et al.  Image Tag Assignment, Refinement and Retrieval , 2015, ACM Multimedia.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Xin Liu,et al.  LCQMC:A Large-scale Chinese Question Matching Corpus , 2018, COLING.

[10]  Gerhard Weikum,et al.  CredEye: A Credibility Lens for Analyzing and Explaining Misinformation , 2018, WWW.

[11]  Preslav Nakov,et al.  That is a Known Lie: Detecting Previously Fact-Checked Claims , 2020, ACL.

[12]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[13]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[14]  Preslav Nakov,et al.  FANG: Leveraging Social Context for Fake News Detection Using Graph Representation , 2020, CIKM.

[15]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[16]  Yang Guo,et al.  On top-k recommendation using social networks , 2012, RecSys.

[17]  Iryna Gurevych,et al.  Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation , 2020, EMNLP.

[18]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[19]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Xiaoyong Du,et al.  Analogical Reasoning on Chinese Morphological and Semantic Relations , 2018, ACL.

[21]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[22]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[23]  Jimmy J. Lin,et al.  Applying BERT to Document Retrieval with Birch , 2019, EMNLP.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Andreas Vlachos,et al.  The Fact Extraction and VERification (FEVER) Shared Task , 2018, FEVER@EMNLP.

[26]  Xirong Li,et al.  Mining Dual Emotion for Fake News Detection , 2019, WWW.

[27]  Kyumin Lee,et al.  Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News , 2020, EMNLP.

[28]  Kyumin Lee,et al.  Learning from Fact-checkers: Analysis and Generation of Fact-checking Language , 2019, SIGIR.

[29]  Xuezhi Wang,et al.  Relevant Document Discovery for Fact-Checking Articles , 2018, WWW.

[30]  Gerhard Weikum,et al.  DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning , 2018, EMNLP.

[31]  Jimmy J. Lin,et al.  Multi-Stage Document Ranking with BERT , 2019, ArXiv.

[32]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[33]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[34]  Yang Liu,et al.  Early Detection of Fake News on Social Media Through Propagation Path Classification with Recurrent and Convolutional Networks , 2018, AAAI.

[35]  Murhaf Fares,et al.  Word vectors, reuse, and replicability: Towards a community repository of large-text resources , 2017, NODALIDA.

[36]  Jimmy J. Lin,et al.  Overview of the TREC-2014 Microblog Track , 2014, TREC.

[37]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[38]  André L. V. Coelho,et al.  Classification with Imbalanced Data , 2015 .

[39]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[40]  Xing Zhou,et al.  Real-Time News Cer tification System on Sina Weibo , 2015, WWW.

[41]  Nick Craswell Mean Reciprocal Rank , 2009, Encyclopedia of Database Systems.

[42]  Piotr Przybyla,et al.  Capturing the Style of Fake News , 2020, AAAI.