Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information

Recent research on reinforcement learning (RL) has suggested that trained agents are vulnerable to maliciously-crafted adversarial samples. In this work, we show how such samples can be generalised from White-box and Grey-box attacks to a strong Black-box case, where the attacker has no knowledge of the agents, their training parameters or their training methods. We use sequence-to-sequence models to predict a single action or a sequence of future actions that a trained agent will make. First, we show that our approximation model, based on time-series information from the agent, consistently predicts RL agents’ future actions with high accuracy in a Black-box setup on a wide range of games and RL algorithms. Second, we find that although adversarial samples are transferable from the sequence-to-sequence model to our RL agents, they often outperform Random Gaussian Noise only marginally. Third, we propose a novel use for adversarial samples in Black-box attacks of RL agents: they can be used to trigger a trained agent to misbehave after a specific time delay. This potentially enables an attacker to use devices controlled by RL agents as time bombs.

[1]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[2]  Michael I. Jordan,et al.  Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.

[3]  Ming-Yu Liu,et al.  Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[4]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[5]  Ion Stoica,et al.  Ray RLLib: A Composable and Scalable Reinforcement Learning Library , 2017, NIPS 2017.

[6]  Nicholas Jing Yuan,et al.  DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.

[7]  Ian J. Goodfellow,et al.  Technical Report on the CleverHans v2.1.0 Adversarial Examples Library , 2016 .

[8]  Huichen Lihuichen DECISION-BASED ADVERSARIAL ATTACKS: RELIABLE ATTACKS AGAINST BLACK-BOX MACHINE LEARNING MODELS , 2017 .

[9]  Hao Chen,et al.  MagNet: A Two-Pronged Defense against Adversarial Examples , 2017, CCS.

[10]  Robert E. Schapire,et al.  A Reduction from Apprenticeship Learning to Classification , 2010, NIPS.

[11]  Yiren Zhao,et al.  To compress or not to compress: Understanding the Interactions between Adversarial Attacks and Neural Network Compression , 2018, SysML.

[12]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[13]  Girish Chowdhary,et al.  Robust Deep Reinforcement Learning with Adversarial Attacks , 2017, AAMAS.

[14]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[17]  Michael Kearns,et al.  Reinforcement learning for optimized trade execution , 2006, ICML.

[18]  Cheng-Zhong Xu,et al.  Sitatapatra: Blocking the Transfer of Adversarial Samples , 2019, ArXiv.

[19]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[20]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[21]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[22]  Nicholas Carlini,et al.  Stateful Detection of Black-Box Adversarial Attacks , 2019, Proceedings of the 1st ACM Workshop on Security and Privacy on Artificial Intelligence.

[23]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[24]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[25]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[26]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[27]  Peter Stone,et al.  Behavioral Cloning from Observation , 2018, IJCAI.

[28]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[29]  Ryohei Nakatsu,et al.  Emotion Recognition in Speech Using Neural Networks , 2000, Neural Computing & Applications.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[32]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[33]  J A Bagnell,et al.  An Invitation to Imitation , 2015 .

[34]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[35]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[36]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[37]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[38]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[39]  Arslan Munir,et al.  Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks , 2017, MLDM.

[40]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[41]  Atul Prakash,et al.  Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[43]  Yiren Zhao,et al.  The Taboo Trap: Behavioural Detection of Adversarial Samples , 2018, ArXiv.

[44]  Yiren Zhao,et al.  Towards Certifiable Adversarial Sample Detection , 2020, AISec@CCS.

[45]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[46]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[47]  Patrick D. McDaniel,et al.  Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples , 2016, ArXiv.

[48]  Thomas G. Dietterich,et al.  Active Imitation Learning via Reduction to I.I.D. Active Learning , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[49]  Aleksander Madry,et al.  Exploring the Landscape of Spatial Robustness , 2017, ICML.