Multi-Armed Bandits with Local Differential Privacy

This paper investigates the problem of regret minimization for multi-armed bandit (MAB) problems with local differential privacy (LDP) guarantee. In stochastic bandit systems, the rewards may refer to the users' activities, which may involve private information and the users may not want the agent to know. However, in many cases, the agent needs to know these activities to provide better services such as recommendations and news feeds. To handle this dilemma, we adopt differential privacy and study the regret upper and lower bounds for MAB algorithms with a given LDP guarantee. In this paper, we prove a lower bound and propose algorithms whose regret upper bounds match the lower bound up to constant factors. Numerical experiments also confirm our conclusions.

[1]  Donald A. Berry,et al.  Bandit Problems: Sequential Allocation of Experiments. , 1986 .

[2]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[3]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[4]  Zhenqi Huang,et al.  Differentially Private Distributed Optimization , 2014, ICDCN.

[5]  Christos Dimitrakakis,et al.  Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost? , 2019, ArXiv.

[6]  Christos Dimitrakakis,et al.  Achieving Privacy in the Adversarial Multi-Armed Bandit , 2017, AAAI.

[7]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, TSEC.

[8]  Mohammad Malekzadeh,et al.  Privacy-Preserving Bandits , 2020, MLSys.

[9]  Sever S Dragomir,et al.  Some Inequalities For The Kullback-Leibler And x²−Distances In Information Theory And Applications , 2000 .

[10]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[11]  Emilie Kaufmann,et al.  Corrupt Bandits for Preserving Local Privacy , 2017, ALT.

[12]  L. Gordon,et al.  Tutorial on large deviations for the binomial distribution. , 1989, Bulletin of mathematical biology.

[13]  Nikita Mishra,et al.  Private Stochastic Multi-arm Bandits: From Theory to Practice , 2014 .

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[16]  Awni Hannun,et al.  Privacy-Preserving Contextual Bandits , 2019, ArXiv.

[17]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[18]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[19]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[20]  Awni Hannun,et al.  Privacy-Preserving Multi-Party Contextual Bandits , 2019 .

[21]  Nidhi Hegde,et al.  Privacy-Preserving Q-Learning with Functional Noise in Continuous Spaces , 2019, NeurIPS.

[22]  Nikita Mishra,et al.  (Nearly) Optimal Differentially Private Stochastic Multi-Arm Bandits , 2015, UAI.

[23]  Roshan Shariff,et al.  Differentially Private Contextual Linear Bandits , 2018, NeurIPS.

[24]  Christos Dimitrakakis,et al.  Differentially private, multi-agent multi-armed bandits , 2015, EWRL 2015.

[25]  Christos Dimitrakakis,et al.  Algorithms for Differentially Private Multi-Armed Bandits , 2015, AAAI.