How Private Is Your RL Policy? An Inverse RL Based Analysis Framework

Reinforcement Learning (RL) enables agents to learn how to perform various tasks from scratch. In domains like autonomous driving, recommendation systems and more, optimal RL policies learned could cause a privacy breach if the policies memorize any part of the private reward. We study the set of existing differentially-private RL policies derived from various RL algorithms such as Value Iteration, Deep Q Networks, and Vanilla Proximal Policy Optimization. We propose a new Privacy-Aware Inverse RL (PRIL) analysis framework, that performs reward reconstruction as an adversarial attack on private policies that the agents may deploy. For this, we introduce the reward reconstruction attack, wherein we seek to reconstruct the original reward from a privacy-preserving policy using an Inverse RL algorithm. An adversary must do poorly at reconstructing the original reward function if the agent uses a tightly private policy. Using this framework, we empirically test the effectiveness of the privacy guarantee offered by the private algorithms on multiple instances of the FrozenLake domain of varying complexities. Based on the analysis performed, we infer a gap between the current standard of privacy offered and the standard of privacy needed to protect reward functions in RL. We do so by quantifying the extent to which each private policy protects the reward function by measuring distances between the original and reconstructed rewards.

[1]  Jiqiang Liu,et al.  Adversarial attack and defense in reinforcement learning-from AI security view , 2019, Cybersecur..

[2]  Doina Precup,et al.  Actor Critic with Differentially Private Critic , 2019, ArXiv.

[3]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[4]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[5]  Akshay Krishnamurthy,et al.  Private Reinforcement Learning with PAC and Regret Guarantees , 2020, ICML.

[6]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7]  A. Fischer Inverse Reinforcement Learning , 2012 .

[8]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[9]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[10]  Sergey Levine,et al.  Adversarial Policies: Attacking Deep Reinforcement Learning , 2019, ICLR.

[11]  Doina Precup,et al.  Differentially Private Policy Evaluation , 2016, ICML.

[12]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[13]  Tuomas P. Oikarinen,et al.  Robust Deep Reinforcement Learning through Adversarial Loss , 2020, NeurIPS.

[14]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[15]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[16]  Awni Hannun,et al.  Privacy-Preserving Multi-Party Contextual Bandits , 2019 .

[17]  Jihoon Yang,et al.  Differentially Private Actor and Its Eligibility Trace , 2020 .

[18]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[19]  N. Hegde,et al.  Privacy-Preserving Q-Learning with Functional Noise in Continuous Spaces , 2019, NeurIPS.

[20]  Bo Li,et al.  How You Act Tells a Lot: Privacy-Leaking Attack on Deep Reinforcement Learning , 2019, AAMAS.

[21]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[22]  Markus Wulfmeier,et al.  Deep Inverse Reinforcement Learning , 2015, ArXiv.

[23]  Rina Dechter,et al.  Value iteration and policy iteration algorithms for Markov decision problem , 1996 .

[24]  Dawn Xiaodong Song,et al.  Delving into adversarial attacks on deep policies , 2017, ICLR.

[25]  Shuang Song,et al.  Making the Shoe Fit: Architectures, Initializations, and Tuning for Learning with Privacy , 2019 .

[26]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.