论文信息 - Combining exogenous and endogenous signals with a semi-supervised co-attention network for early detection of COVID-19 fake tweets

Combining exogenous and endogenous signals with a semi-supervised co-attention network for early detection of COVID-19 fake tweets

Fake tweets are observed to be ever-increasing, demanding immediate countermeasures to combat their spread. During COVID-19, tweets with misinformation should be flagged and neutralised in their early stages to mitigate the damages. Most of the existing methods for early detection of fake news assume to have enough propagation information for large labelled tweets – which may not be an ideal setting for cases like COVID-19 where both aspects are largely absent. In this work, we present ENDEMIC, a novel early detection model which leverages exogenous and endogenous signals related to tweets, while learning on limited labelled data. We first develop a novel dataset, called ECTF for early COVID-19 Twitter fake news, with additional behavioural test-sets to validate early detection. We build a heterogeneous graph with follower-followee, user-tweet, and tweet-retweet connections and train a graph embedding model to aggregate propagation information. Graph embeddings and contextual features constitute endogenous, while time-relative web-scraped information constitutes exogenous signals. ENDEMIC is trained in a semi-supervised fashion, overcoming the challenge of limited labelled data. We propose a co-attention mechanism to fuse signal representations optimally. Experimental results on ECTF, PolitiFact, and GossipCop show that ENDEMIC is highly reliable in detecting early fake tweets, outperforming nine state-of-the-art methods significantly. © 2021, Springer Nature Switzerland AG.

Tanmoy Chakraborty | Nidhi | William Scott Paka | Shubhashis Sengupta | Rachit Bansal

[1] Joan Bruna,et al. Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[2] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.

[3] Nan Chen,et al. Constrained NMF-based semi-supervised learning for social media spammer detection , 2017, Knowl. Based Syst..

[4] Diyi Yang,et al. Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[5] Fenglong Ma,et al. Weak Supervision for Fake News Detection via Reinforcement Learning , 2019, AAAI.

[6] Heiko Paulheim,et al. Weakly Supervised Learning for Fake News Detection on Twitter , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[7] Noah A. Smith,et al. Variational Pretraining for Semi-supervised Text Classification , 2019, ACL.

[8] Reza Zafarani,et al. Fake News Early Detection , 2019, Digital Threats: Research and Practice.

[9] Diyi Yang,et al. MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification , 2020, ACL.

[10] Subhabrata Mukherjee,et al. Leveraging Multi-Source Weak Social Supervision for Early Detection of Fake News , 2020, ArXiv.

[11] Yang Liu,et al. Early Detection of Fake News on Social Media Through Propagation Path Classification with Recurrent and Convolutional Networks , 2018, AAAI.

[12] Davide Eynard,et al. Fake News Detection on Social Media using Geometric Deep Learning , 2019, ArXiv.

[13] Barbara Poblete,et al. Information credibility on twitter , 2011, WWW.

[14] Andrew M. Dai,et al. Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.

[15] David C. Parkes,et al. A Kernel of Truth: Determining Rumor Veracity on Twitter by Diffusion Pattern Alone , 2020, WWW.

[16] Evangelos E. Papalexakis,et al. Semi-supervised Content-Based Detection of Misinformation via Tensor Embeddings , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[17] Iryna Gurevych,et al. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[18] Mathias Niepert,et al. Learning Convolutional Neural Networks for Graphs , 2016, ICML.