Topic-oriented Adversarial Attacks against Black-box Neural Ranking Models

Neural ranking models (NRMs) have attracted considerable attention in information retrieval. Unfortunately, NRMs may inherit the adversarial vulnerabilities of general neural networks, which might be leveraged by black-hat search engine optimization practitioners. Recently, adversarial attacks against NRMs have been explored in the paired attack setting, generating an adversarial perturbation to a target document for a specific query. In this paper, we focus on a more general type of perturbation and introduce the topic-oriented adversarial ranking attack task against NRMs, which aims to find an imperceptible perturbation that can promote a target document in ranking for a group of queries with the same topic. We define both static and dynamic settings for the task and focus on decision-based black-box attacks. We propose a novel framework to improve topic-oriented attack performance based on a surrogate ranking model. The attack problem is formalized as a Markov decision process (MDP) and addressed using reinforcement learning. Specifically, a topic-oriented reward function guides the policy to find a successful adversarial example that can be promoted in rankings to as many queries as possible in a group. Experimental results demonstrate that the proposed framework can significantly outperform existing attack strategies, and we conclude by re-iterating that there exist potential risks for applying NRMs in the real world.

[1]  Xueqi Cheng,et al.  Are Neural Ranking Models Robust? , 2021, ACM Trans. Inf. Syst..

[2]  Wei Lu,et al.  Order-Disorder: Imitation Adversarial Attacks for Black-box Neural Ranking Models , 2022, CCS.

[3]  Avishek Anand,et al.  BERT Rankers are Brittle: A Study using Adversarial Document Perturbations , 2022, ICTIR.

[4]  L. Rokach,et al.  A Universal Adversarial Policy for Text Classifiers , 2022, Neural Networks.

[5]  M. de Rijke,et al.  State Encoders in Reinforcement Learning for Recommendation: A Reproducibility Study , 2022, SIGIR.

[6]  M. de Rijke,et al.  PRADA: Practical Black-box Adversarial Attacks against Neural Ranking Models , 2022, ACM Trans. Inf. Syst..

[7]  Yixing Fan,et al.  Pre-training Methods in Information Retrieval , 2021, ArXiv.

[8]  Zhicheng Dou,et al.  Pre-training for Ad-hoc Retrieval: Hyperlink is Also You Need , 2021, CIKM.

[9]  Yi Wang,et al.  DAIR: A Query-Efficient Decision-based Attack on Image Retrieval Systems , 2021, SIGIR.

[10]  Jinfeng Li,et al.  QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Xueqi Cheng,et al.  PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval , 2020, WSDM.

[12]  Xinwei Yu,et al.  Universal Adversarial Attacks with Natural Triggers for Text Classification , 2020, NAACL.

[13]  Alexander Rush,et al.  Adversarial Semantic Collisions , 2020, EMNLP.

[14]  Manisha Verma,et al.  One word at a time: adversarial attacks on retrieval models , 2020, ArXiv.

[15]  Nick Craswell,et al.  ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search , 2020, CIKM.

[16]  Moshe Tennenholtz,et al.  Ranking-Incentivized Quality Preserving Content Modification , 2020, SIGIR.

[17]  Issa Annamoradnejad,et al.  ColBERT: Using BERT Sentence Embedding for Humor Detection , 2020, ArXiv.

[18]  Xinyu Dai,et al.  A Reinforced Generation of Adversarial Samples for Neural Machine Translation , 2019, ArXiv.

[19]  Joey Tianyi Zhou,et al.  Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment , 2019, AAAI.

[20]  Quan Z. Sheng,et al.  Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey , 2019 .

[21]  Liang Zhao,et al.  LexicalAT: Lexical-Based Adversarial Reinforcement Training for Robust Sentiment Classification , 2019, EMNLP.

[22]  Prashanth Vijayaraghavan,et al.  Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model , 2019, ECML/PKDD.

[23]  Sameer Singh,et al.  Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[24]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[25]  Jamie Callan,et al.  Deeper Text Understanding for IR with Contextual Neural Language Modeling , 2019, SIGIR.

[26]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[27]  Julian Togelius,et al.  Playing Atari with Six Neurons , 2018, AAMAS.

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Jimmy J. Lin,et al.  Anserini: Reproducible Ranking Baselines Using Lucene , 2018, ACM J. Data Inf. Qual..

[30]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[31]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[32]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[33]  M. Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[34]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[36]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[37]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[38]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[39]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[40]  W. Bruce Croft,et al.  A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[41]  David Vandyke,et al.  Counter-fitting Word Vectors to Linguistic Constraints , 2016, NAACL.

[42]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[43]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[44]  Svetlana Lazebnik,et al.  Active Object Localization with Deep Reinforcement Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[45]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[46]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[47]  Charles L. A. Clarke,et al.  Overview of the TREC 2012 Web Track , 2012, TREC.

[48]  Hang Li,et al.  Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[49]  Brian D. Davison,et al.  Adversarial Web Search , 2011, Found. Trends Inf. Retr..

[50]  Nick Craswell,et al.  Overview of the TREC 2009 Web Track , 2009, TREC.

[51]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[52]  Jian Pei,et al.  OSD: An Online Web Spam Detection System , 2009 .

[53]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[54]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[55]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[56]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[57]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .