Rumor knowledge embedding based data augmentation for imbalanced rumor detection

Abstract Rumor detection aims to detect rumors in a timely manner to prevent malicious rumors from misleading the public and disrupting social order. However, rumor detection suffers from the problem of imbalanced data. Existing methods of text generation and imbalanced learning are insufficient in addressing this imbalance because they are not specialized in rumor tasks. We propose a knowledge graph-based rumor data augmentation method: Graph Embedding-based Rumor Data Augmentation (GERDA), which simulates the generation process of rumor from the perspective of knowledge. To model the generation process of false information, we introduce knowledge representation in the process of text generation. To better learn the graph structured rumor data, we propose a graph-based rumor text generative model G2S-AT-GAN, which uses an attention-based graph convolutional neural network and agenerative adversarial network for rumor text generation. Experiments show that our method is able to generate high-quality rumors of diverse topics and the generated rumors can further address rumor data imbalance for better performance in rumor detection.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[3]  Quan Pan,et al.  A Generative Model for category text generation , 2018, Inf. Sci..

[4]  Johan Bollen,et al.  Computational Fact Checking from Knowledge Networks , 2015, PloS one.

[5]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[6]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  Francesco Marcelloni,et al.  A survey on fake news and rumour detection techniques , 2019, Inf. Sci..

[9]  Diego Marcheggiani,et al.  Deep Graph Convolutional Encoders for Structured Data to Text Generation , 2018, INLG.

[10]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[11]  Jeff Z. Pan,et al.  Content Based Fake News Detection Using Knowledge Graphs , 2018, SEMWEB.

[12]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[13]  Zhiyuan Liu,et al.  Statistical and semantic analysis of rumors in Chinese social media , 2015 .

[14]  Wenbing Huang,et al.  Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks , 2020, AAAI.

[15]  Xiaojie Yuan,et al.  Rumor2vec: A rumor detection framework with joint text and propagation structure representation learning , 2021, Inf. Sci..

[16]  Claire Gardent,et al.  The WebNLG Challenge: Generating Text from DBPedia Data , 2016, INLG.

[17]  Yang Liu,et al.  Early Detection of Fake News on Social Media Through Propagation Path Classification with Recurrent and Convolutional Networks , 2018, AAAI.

[18]  Jie Gao,et al.  Data Augmentation for Rumor Detection Using Context-Sensitive Neural Language Model With Large-Scale Credibility Corpus , 2019 .

[19]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[20]  Luo Si,et al.  eventAI at SemEval-2019 Task 7: Rumor Detection on Social Media by Exploiting Content, User Credibility and Propagation Information , 2019, *SEMEVAL.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Diego Marcheggiani,et al.  Exploiting Semantics in Neural Machine Translation with Graph Convolutional Networks , 2018, NAACL.

[25]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[26]  Dazhen Lin,et al.  Content Representation for Microblog Rumor Detection , 2016, UKCI.

[27]  Jintao Li,et al.  Exploiting Multi-domain Visual Information for Fake News Detection , 2019, 2019 IEEE International Conference on Data Mining (ICDM).