Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users’ private and sensitive data. To protect the users’ privacy, privacy-preserving RL algorithms are in demand. In this paper, we study RL with linear function approximation and local differential privacy (LDP) guarantees. We propose a novel pε, δq-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs, and obtains an

[1]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[2]  Mengdi Wang,et al.  Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.

[3]  Doina Precup,et al.  Differentially Private Policy Evaluation , 2016, ICML.

[4]  Vianney Perchet,et al.  Local Differentially Private Regret Minimization in Reinforcement Learning , 2020, ArXiv.

[5]  Michael I. Jordan,et al.  A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm , 2019, ArXiv.

[6]  Prasad Tadepalli,et al.  Model-Based Reinforcement Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[7]  Quanquan Gu,et al.  Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping , 2020, ICML.

[8]  J. Bretagnolle,et al.  Estimation des densités: risque minimax , 1978 .

[9]  Mengdi Wang,et al.  Model-Based Reinforcement Learning with Value-Targeted Regression , 2020, L4DC.

[10]  Nan Jiang,et al.  Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.

[11]  Xiaoyu Chen,et al.  (Locally) Differentially Private Combinatorial Semi-Bandits , 2020, ICML.

[12]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[13]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[14]  Roshan Shariff,et al.  Differentially Private Contextual Linear Bandits , 2018, NeurIPS.

[15]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[16]  Kai Zheng,et al.  Locally Differentially Private (Contextual) Bandits Learning , 2020, NeurIPS.

[17]  Christos Dimitrakakis,et al.  Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost? , 2019, ArXiv.

[18]  Ruosong Wang,et al.  Optimism in Reinforcement Learning with Generalized Linear Function Approximation , 2019, ICLR.

[19]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[20]  Martin J. Wainwright,et al.  Local Privacy, Data Processing Inequalities, and Statistical Minimax Rates , 2013, 1302.3203.

[21]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[22]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[23]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[24]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[25]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[26]  Michael I. Jordan,et al.  Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.

[27]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[28]  Michael L. Littman,et al.  An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[29]  Alessandro Lazaric,et al.  Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.

[30]  Quanquan Gu,et al.  Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes , 2020, COLT.

[31]  Rémi Munos,et al.  Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.

[32]  Benjamin Van Roy,et al.  Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.

[33]  Emilie Kaufmann,et al.  Corrupt Bandits for Preserving Local Privacy , 2017, ALT.

[34]  L. Schmetterer Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete. , 1963 .

[35]  Quanquan Gu,et al.  Logarithmic Regret for Reinforcement Learning with Linear Function Approximation , 2020, ICML.

[36]  Akshay Krishnamurthy,et al.  Private Reinforcement Learning with PAC and Regret Guarantees , 2020, ICML.

[37]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.