论文信息 - Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control

Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control

Deep reinforcement learning has achieved great success in many previously difficult reinforcement learning tasks, yet recent studies show that deep RL agents are also unavoidably susceptible to adversarial perturbations, similar to deep neural networks in classification tasks. Prior works mostly focus on model-free adversarial attacks and agents with discrete actions. In this work, we study the problem of continuous control agents in deep RL with adversarial attacks and propose the first two-step algorithm based on learned model dynamics. Extensive experiments on various MuJoCo domains (Cartpole, Fish, Walker, Humanoid) demonstrate that our proposed framework is much more effective and efficient than model-free based attacks baselines in degrading agent performance as well as driving agents to unsafe states.

[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[3] Luca Daniel,et al. Verification of Neural Network Control Policy Under Persistent Adversarial Perturbation , 2019, ArXiv.

[4] Sandy H. Huang,et al. Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[5] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.

[6] Daan Wierstra,et al. Recurrent Environment Simulators , 2017, ICLR.

[7] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[8] Pushmeet Kohli,et al. Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures , 2018, ICLR.

[9] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[10] Sergey Levine,et al. Adversarial Policies: Attacking Deep Reinforcement Learning , 2019, ICLR.

[11] David A. Wagner,et al. Audio Adversarial Examples: Targeted Attacks on Speech-to-Text , 2018, 2018 IEEE Security and Privacy Workshops (SPW).

[12] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[13] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.

[14] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[15] Sergey Levine,et al. One-shot learning of manipulation skills with online dynamics adaptation and neural network priors , 2015, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16] Ming-Yu Liu,et al. Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[17] Christopher G. Atkeson,et al. A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.

[18] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.