Towards Imperceptible Document Manipulations against Neural Ranking Models

Adversarial attacks have gained traction in order to identify potential vulnerabilities in neural ranking models (NRMs), but current attack methods often introduce grammatical errors, nonsensical expressions, or incoherent text fragments, which can be easily detected. Additionally, current methods rely heavily on the use of a well-imitated surrogate NRM to guarantee the attack effect, which makes them difficult to use in practice. To address these issues, we propose a framework called Imperceptible DocumEnt Manipulation (IDEM) to produce adversarial documents that are less noticeable to both algorithms and humans. IDEM instructs a well-established generative language model, such as BART, to generate connection sentences without introducing easy-to-detect errors, and employs a separate position-wise merging strategy to balance relevance and coherence of the perturbed text. Experimental results on the popular MS MARCO benchmark demonstrate that IDEM can outperform strong baselines while preserving fluency and correctness of the target documents as evidenced by automatic and human evaluations. Furthermore, the separation of adversarial text generation from the surrogate NRM makes IDEM more robust and less affected by the quality of the surrogate NRM.

[1]  Le Sun,et al.  Dealing with textual noise for robust and effective BERT re-ranking , 2023, Inf. Process. Manag..

[2]  Wei Lu,et al.  Order-Disorder: Imitation Adversarial Attacks for Black-box Neural Ranking Models , 2022, CCS.

[3]  Moshe Tennenholtz,et al.  Competitive Search , 2022, SIGIR.

[4]  Avishek Anand,et al.  BERT Rankers are Brittle: A Study using Adversarial Document Perturbations , 2022, ICTIR.

[5]  Tianyi Zhou,et al.  Phrase-level Textual Adversarial Attack with Label Preservation , 2022, NAACL-HLT.

[6]  M. de Rijke,et al.  PRADA: Practical Black-box Adversarial Attacks against Neural Ranking Models , 2022, ACM Trans. Inf. Syst..

[7]  C. Hauff,et al.  Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators , 2021, ECIR.

[8]  Junshuai Song,et al.  TRAttack”:" Text Rewriting Attack Against Text Retrieval , 2022, REPL4NLP.

[9]  Guido Zuccon,et al.  Dealing with Typos for BERT-based Passage Retrieval and Ranking , 2021, EMNLP.

[10]  Shuaiqiang Wang,et al.  Pre-trained Language Model based Ranking in Baidu Search , 2021, KDD.

[11]  Allan Hanbury,et al.  Mitigating the Position Bias of Transformer Models in Passage Re-Ranking , 2021, ECIR.

[12]  Jimmy J. Lin,et al.  How Does BERT Rerank Passages? An Attribution Analysis with Information Bottlenecks , 2021, BLACKBOXNLP.

[13]  Alexander Rush,et al.  Adversarial Semantic Collisions , 2020, EMNLP.

[14]  Manisha Verma,et al.  One word at a time: adversarial attacks on retrieval models , 2020, ArXiv.

[15]  Qingfeng Du,et al.  TextTricker: Loss-based and gradient-based adversarial attacks on text classification models , 2020, Eng. Appl. Artif. Intell..

[16]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[17]  Moshe Tennenholtz,et al.  Ranking-Incentivized Quality Preserving Content Modification , 2020, SIGIR.

[18]  Chris Donahue,et al.  Enabling Language Models to Fill in the Blanks , 2020, ACL.

[19]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[20]  Li Dong,et al.  MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers , 2020, NeurIPS.

[21]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[22]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[23]  Saad Farooq A Survey on Adversarial Information Retrieval on the Web , 2019, ArXiv.

[24]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[25]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[26]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Katharina Kann,et al.  Sentence-Level Fluency Evaluation: References Help, But Can Be Spared! , 2018, CoNLL.

[29]  Moshe Tennenholtz,et al.  Ranking Robustness Under Adversarial Document Manipulations , 2018, SIGIR.

[30]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[31]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[32]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[33]  Moshe Tennenholtz,et al.  Information Retrieval Meets Game Theory: The Ranking Competition Between Documents' Authors , 2017, SIGIR.

[34]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[35]  Ashutosh Kumar Singh,et al.  LINK-BASED SPAM ALGORITHMS IN ADVERSARIAL INFORMATION RETRIEVAL , 2012, Cybern. Syst..

[36]  Brian D. Davison,et al.  Adversarial Web Search , 2011, Found. Trends Inf. Retr..

[37]  Alistair Moffat,et al.  A similarity measure for indefinite rankings , 2010, TOIS.

[38]  Paolo Boldi,et al.  Adversarial information retrieval in the web , 2007 .

[39]  Brian D. Davison,et al.  Adversarial information retrieval on the web (AIRWeb 2006) , 2006, SIGF.

[40]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[41]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[42]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[43]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.