Virtuously Safe Reinforcement Learning

We show that when a third party, the adversary, steps into the two-party setting (agent and operator) of safely interruptible reinforcement learning, a trade-off has to be made between the probability of following the optimal policy in the limit, and the probability of escaping a dangerous situation created by the adversary. So far, the work on safely interruptible agents has assumed a perfect perception of the agent about its environment (no adversary), and therefore implicitly set the second probability to zero, by explicitly seeking a value of one for the first probability. We show that (1) agents can be made both interruptible and adversary-resilient, and (2) the interruptibility can be made safe in the sense that the agent itself will not seek to avoid it. We also solve the problem that arises when the agent does not go completely greedy, i.e. issues with safe exploration in the limit. Resilience to perturbed perception, safe exploration in the limit, and safe interruptibility are the three pillars of what we call \emph{virtuously safe reinforcement learning}.

[1]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[2]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[3]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[4]  Rachid Guerraoui,et al.  Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning , 2017, NIPS.

[5]  Laurent Orseau,et al.  AI Safety Gridworlds , 2017, ArXiv.

[6]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[7]  Guanpeng Li,et al.  Understanding Error Propagation in Deep Learning Neural Network (DNN) Accelerators and Applications , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[9]  Ling Huang,et al.  ANTIDOTE: understanding and defending against poisoning of anomaly detectors , 2009, IMC '09.

[10]  Kavosh Asadi,et al.  An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.

[11]  Rachid Guerraoui,et al.  The Hidden Vulnerability of Distributed Learning in Byzantium , 2018, ICML.

[12]  Vincenzo Piuri,et al.  Analysis of Fault Tolerance in Artificial Neural Networks , 2001, J. Parallel Distributed Comput..

[13]  Silvio Savarese,et al.  Adversarially Robust Policy Learning: Active construction of physically-plausible perturbations , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Laurent Orseau,et al.  Safely Interruptible Agents , 2016, UAI.

[15]  Anca D. Dragan,et al.  The Off-Switch Game , 2016, IJCAI.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[18]  F. Vallet,et al.  Robustness in Multilayer Perceptrons , 1993, Neural Computation.

[19]  Logan Engstrom,et al.  Synthesizing Robust Adversarial Examples , 2017, ICML.

[20]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[21]  Rachid Guerraoui,et al.  On the Robustness of a Neural Network , 2017, 2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS).

[22]  Rachid Guerraoui,et al.  When Neurons Fail , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[23]  Martin Wattenberg,et al.  Adversarial Spheres , 2018, ICLR.

[24]  Steffen Udluft,et al.  Safe exploration for reinforcement learning , 2008, ESANN.

[25]  Tomás Svoboda,et al.  Safe Exploration Techniques for Reinforcement Learning - An Overview , 2014, MESAS.