How Vulnerable Are Automatic Fake News Detection Methods to Adversarial Attacks?

As the spread of false information on the internet has increased dramatically in recent years, more and more attention is being paid to automated fake news detection. Some fake news detection methods are already quite successful. Nevertheless, there are still many vulnerabilities in the detection algorithms. The reason for this is that fake news publishers can structure and formulate their texts in such a way that a detection algorithm does not expose this text as fake news. This paper shows that it is possible to automatically attack state-ofthe-art models that have been trained to detect Fake News, making these vulnerable. For this purpose, corresponding models were first trained based on a dataset. Then, using TextAttack, an attempt was made to manipulate the trained models in such a way that previously correctly identified fake news was classified as true news. The results show that it is possible to automatically bypass Fake News detection mechanisms, leading to implications concerning existing policy initiatives.

[1]  Justin Hsu,et al.  Fake News Detection via NLP is Vulnerable to Adversarial Attacks , 2019, ICAART.

[2]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[3]  Sadam Fujioka,et al.  Code , 2021, SIGGRAPH Art Gallery.

[4]  John X. Morris,et al.  TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP , 2020, EMNLP.

[5]  Ting Wang,et al.  TextBugger: Generating Adversarial Text Against Real-world Applications , 2018, NDSS.

[6]  Reza Zafarani,et al.  Fake News: A Survey of Research, Detection Methods, and Opportunities , 2018, ArXiv.

[7]  Siddhant Garg,et al.  BAE: BERT-based Adversarial Examples for Text Classification , 2020, EMNLP.

[8]  Bhuwan Dhingra,et al.  Combating Adversarial Misspellings with Robust Word Recognition , 2019, ACL.

[9]  Vinicius Woloszyn,et al.  Untrue.News: A New Search Engine For Fake Stories , 2020, ArXiv.

[10]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[11]  Niloy Ganguly,et al.  Stop Clickbait: Detecting and preventing clickbaits in online news media , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[12]  Zhiyuan Liu,et al.  Word-level Textual Adversarial Attacking as Combinatorial Optimization , 2019, ACL.

[13]  Shi Feng,et al.  Pathologies of Neural Models Make Interpretations Difficult , 2018, EMNLP.

[14]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  C. Marsden,et al.  Regulating disinformation with artificial intelligence:effects of disinformation initiatives on freedom of expression and media pluralism , 2019 .

[16]  Francisco Rangel,et al.  FakeFlow: Fake News Detection by Modeling the Flow of Affective Information , 2021, EACL.

[17]  Kun He,et al.  Natural language adversarial defense through synonym encoding , 2021, UAI.

[18]  Wanxiang Che,et al.  Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[19]  Stefan Dietze,et al.  ClaimsKG: A Knowledge Graph of Fact-Checked Claims , 2019, SEMWEB.

[20]  Yanjun Qi,et al.  Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[21]  Vinicius Woloszyn,et al.  Towards a Novel Benchmark for Automatic Generation of ClaimReview Markup , 2021, WebSci.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Quoc V. Le,et al.  A Simple Method for Commonsense Reasoning , 2018, ArXiv.

[24]  Peter Szolovits,et al.  Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2020, AAAI.

[25]  Huan Liu,et al.  dEFEND: Explainable Fake News Detection , 2019, KDD.

[26]  Sameer Singh,et al.  Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.

[27]  Dat Quoc Nguyen,et al.  BERTweet: A pre-trained language model for English Tweets , 2020, EMNLP.