Certified Adversarial Robustness for Deep Reinforcement Learning

Deep Neural Network-based systems are now the state-of-the-art in many robotics tasks, but their application in safety-critical domains remains dangerous without formal guarantees on network robustness. Small perturbations to sensor inputs (from noise or adversarial examples) are often enough to change network-based decisions, which was already shown to cause an autonomous vehicle to swerve into oncoming traffic. In light of these dangers, numerous algorithms have been developed as defensive mechanisms from these adversarial inputs, some of which provide formal robustness guarantees or certificates. This work leverages research on certified adversarial robustness to develop an online certified defense for deep reinforcement learning algorithms. The proposed defense computes guaranteed lower bounds on state-action values during execution to identify and choose the optimal action under a worst-case deviation in input space due to possible adversaries or noise. The approach is demonstrated on a Deep Q-Network policy and is shown to increase robustness to noise and adversaries in pedestrian collision avoidance scenarios and a classic control task.

[1]  Andreas Krause,et al.  Unfreezing the robot: Navigation in dense, interacting crowds , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[3]  Matthew Mirman,et al.  Online Robustness Training for Deep Reinforcement Learning , 2019, ArXiv.

[4]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[5]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[6]  Jan Peters,et al.  Domain Randomization for Simulation-Based Policy Optimization with Transferability Assessment , 2018, CoRL.

[7]  Alessio Lomuscio,et al.  An approach to reachability analysis for feed-forward ReLU neural networks , 2017, ArXiv.

[8]  Cho-Jui Hsieh,et al.  A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks , 2019, NeurIPS.

[9]  Arslan Munir,et al.  Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks , 2017, MLDM.

[10]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[11]  J. Zico Kolter,et al.  Certified Adversarial Robustness via Randomized Smoothing , 2019, ICML.

[12]  Sijia Liu,et al.  CNN-Cert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks , 2018, AAAI.

[13]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[15]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[16]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[17]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[18]  Jonathan P. How,et al.  Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[20]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[21]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Cho-Jui Hsieh,et al.  Towards Stable and Efficient Training of Verifiably Robust Neural Networks , 2019, ICLR.

[23]  Rüdiger Ehlers,et al.  Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks , 2017, ATVA.

[24]  Jinfeng Yi,et al.  ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models , 2017, AISec@CCS.

[25]  Sergey Levine,et al.  Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.

[26]  Pushmeet Kohli,et al.  Adversarial Risk and the Dangers of Evaluating Against Weak Attacks , 2018, ICML.

[27]  Dinesh Manocha,et al.  Reciprocal n-Body Collision Avoidance , 2011, ISRR.

[28]  Stephen Tyree,et al.  GA3C: GPU-based A3C for Deep Reinforcement Learning , 2016, ArXiv.

[29]  Dinh Thai Hoang,et al.  Challenges and Countermeasures for Adversarial Attacks on Deep Reinforcement Learning , 2020, ArXiv.

[30]  Pushmeet Kohli,et al.  Training verified learners with learned verifiers , 2018, ArXiv.

[31]  Cho-Jui Hsieh,et al.  Efficient Neural Network Robustness Certification with General Activation Functions , 2018, NeurIPS.

[32]  Petros Christodoulou,et al.  Soft Actor-Critic for Discrete Action Settings , 2019, ArXiv.

[33]  Chin-Hui Lee,et al.  Enhanced Adversarial Strategically-Timed Attacks Against Deep Reinforcement Learning , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[35]  Matthew Mirman,et al.  Fast and Effective Robustness Certification , 2018, NeurIPS.

[36]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[37]  Russ Tedrake,et al.  Evaluating Robustness of Neural Networks with Mixed Integer Programming , 2017, ICLR.

[38]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[39]  William T. B. Uther,et al.  Adversarial Reinforcement Learning , 2003 .

[40]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[41]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[42]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[43]  Junfeng Yang,et al.  Efficient Formal Safety Analysis of Neural Networks , 2018, NeurIPS.

[44]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[45]  Byron Boots,et al.  Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.

[46]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[47]  Dawn Xiaodong Song,et al.  Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong , 2017, ArXiv.

[48]  Martín Abadi,et al.  Adversarial Patch , 2017, ArXiv.

[49]  Jun Morimoto,et al.  Robust Reinforcement Learning , 2005, Neural Computation.

[50]  Swarat Chaudhuri,et al.  AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[51]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[52]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[53]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[54]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[55]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[56]  Luca Daniel,et al.  Verification of Neural Network Control Policy Under Persistent Adversarial Perturbation , 2019, ArXiv.

[57]  Dawn Xiaodong Song,et al.  Delving into adversarial attacks on deep policies , 2017, ICLR.

[58]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[59]  Timon Gehr,et al.  An abstract domain for certifying neural networks , 2019, Proc. ACM Program. Lang..

[60]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[61]  Matthew Mirman,et al.  Distilled Agent DQN for Provable Adversarial Robustness , 2018 .

[62]  Jonathan P. How,et al.  Safe Reinforcement Learning With Model Uncertainty Estimates , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[63]  Dinesh Manocha,et al.  Getting Robots Unfrozen and Unlost in Dense Pedestrian Crowds , 2018, IEEE Robotics and Automation Letters.

[64]  Silvio Savarese,et al.  Adversarially Robust Policy Learning: Active construction of physically-plausible perturbations , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[65]  Sergey Levine,et al.  Adversarial Policies: Attacking Deep Reinforcement Learning , 2019, ICLR.

[66]  Inderjit S. Dhillon,et al.  Towards Fast Computation of Certified Robustness for ReLU Networks , 2018, ICML.

[67]  Lujo Bauer,et al.  Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[68]  Matthias Heger,et al.  Consideration of Risk in Reinforcement Learning , 1994, ICML.

[69]  J. Zico Kolter,et al.  Wasserstein Adversarial Examples via Projected Sinkhorn Iterations , 2019, ICML.

[70]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[71]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[72]  Afonso S. Bandeira,et al.  A note on Probably Certifiably Correct algorithms , 2015, ArXiv.

[73]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.