论文信息 - BERT Rankers are Brittle: A Study using Adversarial Document Perturbations

BERT Rankers are Brittle: A Study using Adversarial Document Perturbations

Contextual ranking models based on BERT are now well established for a wide range of passage and document ranking tasks. However, the robustness of BERT-based ranking models under adversarial inputs is under-explored. In this paper, we argue that BERT-rankers are not immune to adversarial attacks targeting retrieved documents given a query. Firstly, we propose algorithms for adversarial perturbation of both highly relevant and non-relevant documents using gradient-based optimization methods. The aim of our algorithms is to add/replace a small number of tokens to a highly relevant or non-relevant document to cause a large rank demotion or promotion. Our experiments show that a small number of tokens can already result in a large change in the rank of a document. Moreover, we find that BERT-rankers heavily rely on the document start/head for relevance prediction, making the initial part of the document more susceptible to adversarial attacks. More interestingly, we find a small set of recurring adversarial words that when added to documents result in successful rank demotion/promotion of any relevant/non-relevant document respectively. Finally, our adversarial tokens also show particular topic preferences within and across datasets, exposing potential biases from BERT pre-training or downstream datasets.

Avishek Anand | Lijun Lyu | Yumeng Wang

[1] Shi Feng,et al. Concealed Data Poisoning Attacks on NLP Models , 2021, NAACL.

[2] Liqun Chen,et al. Contextualized Perturbation for Textual Adversarial Attack , 2020, NAACL.

[3] Manisha Verma,et al. One word at a time: adversarial attacks on retrieval models , 2020, ArXiv.

[4] Qingfeng Du,et al. TextTricker: Loss-based and gradient-based adversarial attacks on text classification models , 2020, Eng. Appl. Artif. Intell..

[5] Aminul Huq,et al. Adversarial Attacks and Defense on Texts: A Survey , 2020, ArXiv.

[6] Xipeng Qiu,et al. BERT-ATTACK: Adversarial Attack against BERT Using BERT , 2020, EMNLP.

[7] Avishek Anand,et al. Model agnostic interpretability of rankers via intent modelling , 2020, FAT*.

[8] Quan Z. Sheng,et al. Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey , 2019 .

[9] Sameer Singh,et al. Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[10] Peter Szolovits,et al. Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment , 2019, ArXiv.

[11] Wanxiang Che,et al. Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[12] Eric Wallace,et al. Trick Me If You Can: Human-in-the-Loop Generation of Adversarial Examples for Question Answering , 2018, TACL.

[13] Moshe Tennenholtz,et al. Ranking Robustness Under Adversarial Document Manipulations , 2018, SIGIR.

[14] Dejing Dou,et al. HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[15] Dan Boneh,et al. Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[16] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.

[17] Andrew M. Dai,et al. Adversarial Training Methods for Semi-Supervised Text Classification , 2016, ICLR.