Interactive and Concentrated Differential Privacy for Bandits

Bandits play a crucial role in interactive learning schemes and modern recommender systems. However, these systems often rely on sensitive user data, making privacy a critical concern. This paper investigates privacy in bandits with a trusted centralized decision-maker through the lens of interactive Differential Privacy (DP). While bandits under pure $\epsilon$-global DP have been well-studied, we contribute to the understanding of bandits under zero Concentrated DP (zCDP). We provide minimax and problem-dependent lower bounds on regret for finite-armed and linear bandits, which quantify the cost of $\rho$-global zCDP in these settings. These lower bounds reveal two hardness regimes based on the privacy budget $\rho$ and suggest that $\rho$-global zCDP incurs less regret than pure $\epsilon$-global DP. We propose two $\rho$-global zCDP bandit algorithms, AdaC-UCB and AdaC-GOPE, for finite-armed and linear bandits respectively. Both algorithms use a common recipe of Gaussian mechanism and adaptive episodes. We analyze the regret of these algorithms to show that AdaC-UCB achieves the problem-dependent regret lower bound up to multiplicative constants, while AdaC-GOPE achieves the minimax regret lower bound up to poly-logarithmic factors. Finally, we provide experimental validation of our theoretical results under different settings.

[1]  Aurélien Garivier,et al.  On the Statistical Complexity of Estimation and Testing under Privacy Constraints , 2022, Trans. Mach. Learn. Res..

[2]  D. Basu,et al.  When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits , 2022, NeurIPS.

[3]  Xin Lyu Composition Theorems for Interactive Differential Privacy , 2022, NeurIPS.

[4]  Bo Ji,et al.  Differentially Private Linear Bandits with Partial Distributed Feedback , 2022, 2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt).

[5]  Antonious M. Girgis,et al.  Differentially Private Stochastic Linear Bandits: (Almost) for Free , 2022, ArXiv.

[6]  Sayak Ray Chowdhury,et al.  Distributed Differential Privacy in Multi-Armed Bandits , 2022, ICLR.

[7]  Vikrant Singhal,et al.  New Lower Bounds for Private Estimation and a Generalized Fingerprinting Lemma , 2022, NeurIPS.

[8]  Leonardo Rocha,et al.  Multi-Armed Bandits in Recommendation Systems: A survey of the state-of-the-art and future directions , 2022, Expert Syst. Appl..

[9]  Salil Vadhan,et al.  Concurrent Composition of Differential Privacy , 2021, IACR Cryptol. ePrint Arch..

[10]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[11]  Christos Dimitrakakis,et al.  Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost? , 2019, ArXiv.

[12]  Or Sheffet,et al.  An Optimal Private Stochastic-MAB Algorithm Based on an Optimal Private Stopping Rule , 2019, ICML.

[13]  Ness Shroff,et al.  Data Poisoning Attacks on Stochastic Bandits , 2019, ICML.

[14]  Tony Jebara,et al.  Thompson Sampling for Noncompliant Bandits , 2018, ArXiv.

[15]  Lihong Li,et al.  Adversarial Attacks on Stochastic Bandits , 2018, NeurIPS.

[16]  Roshan Shariff,et al.  Differentially Private Contextual Linear Bandits , 2018, NeurIPS.

[17]  Seth Neel,et al.  Mitigating Bias in Adaptive Data Gathering via Differential Privacy , 2018, ICML.

[18]  Nathan Kallus,et al.  Instrument-Armed Bandits , 2017, ALT.

[19]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[20]  Christos Dimitrakakis,et al.  Achieving Privacy in the Adversarial Multi-Armed Bandit , 2017, AAAI.

[21]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[22]  Guy N. Rothblum,et al.  Concentrated Differential Privacy , 2016, ArXiv.

[23]  Christos Dimitrakakis,et al.  Algorithms for Differentially Private Multi-Armed Bandits , 2015, AAAI.

[24]  Nikita Mishra,et al.  (Nearly) Optimal Differentially Private Stochastic Multi-Arm Bandits , 2015, UAI.

[25]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[26]  Jonathan Ullman,et al.  Fingerprinting codes and the price of approximate differential privacy , 2013, STOC.

[27]  Pramod Viswanath,et al.  The Composition Theorem for Differential Privacy , 2013, IEEE Transactions on Information Theory.

[28]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[29]  D. Bergemann,et al.  Learning and Strategic Pricing , 1996 .

[30]  B. K. Ghosh,et al.  Sequential Tests of Statistical Hypotheses. , 1972 .

[31]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[32]  S. Vadhan,et al.  Concurrent Composition Theorems for all Standard Variants of Differential Privacy , 2022, ArXiv.

[33]  N. Hegde,et al.  Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits , 2022, UAI.