Reinforcement Learning on Encrypted Data

The growing number of applications of Reinforcement Learning (RL) in real-world domains has led to the development of privacy-preserving techniques due to the inherently sensitive nature of data. Most existing works focus on differential privacy, in which information is revealed in the clear to an agent whose learned model should be robust against information leakage to malicious third parties. Motivated by use cases in which only encrypted data might be shared, such as information from sensitive sites, in this work we consider scenarios in which the inputs themselves are sensitive and cannot be revealed. We develop a simple extension to the MDP framework which provides for the encryption of states. We present a preliminary, experimental study of how a DQN agent trained on encrypted states performs in environments with discrete and continuous state spaces. Our results highlight that the agent is still capable of learning in small state spaces even in presence of non-deterministic encryption, but performance collapses in more complex environments.

[1]  Vincent Rijmen,et al.  The Design of Rijndael: AES - The Advanced Encryption Standard , 2002 .

[2]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[3]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[4]  Jung Hee Cheon,et al.  Homomorphic Encryption for Arithmetic of Approximate Numbers , 2017, ASIACRYPT.

[5]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Ilya Kostrikov,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ArXiv.

[8]  Vinod Vaikuntanathan,et al.  Fully Homomorphic Encryption from Ring-LWE and Security for Key Dependent Messages , 2011, CRYPTO.

[9]  Alessandro Casavola,et al.  A Generic Strategy for Fault-Tolerance in Control Systems Distributed Over a Network , 2007, Eur. J. Control.

[10]  N. Hegde,et al.  Privacy-Preserving Q-Learning with Functional Noise in Continuous Spaces , 2019, NeurIPS.

[11]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[12]  Craig Gentry,et al.  (Leveled) fully homomorphic encryption without bootstrapping , 2012, ITCS '12.

[13]  Lucian Busoniu,et al.  Reinforcement learning for control: Performance, stability, and deep approximators , 2018, Annu. Rev. Control..

[14]  Doina Precup,et al.  Differentially Private Policy Evaluation , 2016, ICML.

[15]  Fredrik D. Johansson,et al.  Guidelines for reinforcement learning in healthcare , 2019, Nature Medicine.

[16]  Constance Morel,et al.  Privacy-Preserving Classification on Deep Neural Network , 2017, IACR Cryptol. ePrint Arch..

[17]  Doina Precup,et al.  Actor Critic with Differentially Private Critic , 2019, ArXiv.

[18]  David Ha,et al.  The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning , 2021, NeurIPS.

[19]  Akshay Krishnamurthy,et al.  Private Reinforcement Learning with PAC and Regret Guarantees , 2020, ICML.

[20]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[21]  Kibok Lee,et al.  Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning , 2020, ICLR.

[22]  Bo Li,et al.  How You Act Tells a Lot: Privacy-Leaking Attack on Deep Reinforcement Learning , 2019, AAMAS.

[23]  Frederik Vercauteren,et al.  Fully Homomorphic Encryption with Relatively Small Key and Ciphertext Sizes , 2010, Public Key Cryptography.

[24]  Alberto Ibarrondo,et al.  Pyfhel: PYthon For Homomorphic Encryption Libraries , 2021, WAHC@CCS.

[25]  Craig Gentry,et al.  Better Bootstrapping in Fully Homomorphic Encryption , 2012, Public Key Cryptography.

[26]  Hassan Takabi,et al.  CryptoDL: Deep Neural Networks over Encrypted Data , 2017, ArXiv.

[27]  Craig Gentry,et al.  Homomorphic Evaluation of the AES Circuit , 2012, IACR Cryptol. ePrint Arch..

[28]  Pieter Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.