论文信息 - Interactive and Concentrated Differential Privacy for Bandits

Interactive and Concentrated Differential Privacy for Bandits

Bandits play a crucial role in interactive learning schemes and modern recommender systems. However, these systems often rely on sensitive user data, making privacy a critical concern. This paper investigates privacy in bandits with a trusted centralized decision-maker through the lens of interactive Differential Privacy (DP). While bandits under pure $\epsilon$-global DP have been well-studied, we contribute to the understanding of bandits under zero Concentrated DP (zCDP). We provide minimax and problem-dependent lower bounds on regret for finite-armed and linear bandits, which quantify the cost of $\rho$-global zCDP in these settings. These lower bounds reveal two hardness regimes based on the privacy budget $\rho$ and suggest that $\rho$-global zCDP incurs less regret than pure $\epsilon$-global DP. We propose two $\rho$-global zCDP bandit algorithms, AdaC-UCB and AdaC-GOPE, for finite-armed and linear bandits respectively. Both algorithms use a common recipe of Gaussian mechanism and adaptive episodes. We analyze the regret of these algorithms to show that AdaC-UCB achieves the problem-dependent regret lower bound up to multiplicative constants, while AdaC-GOPE achieves the minimax regret lower bound up to poly-logarithmic factors. Finally, we provide experimental validation of our theoretical results under different settings.

D. Basu | Achraf Azize

[1] Aurélien Garivier,et al. On the Statistical Complexity of Estimation and Testing under Privacy Constraints , 2022, Trans. Mach. Learn. Res..

[2] D. Basu,et al. When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits , 2022, NeurIPS.

[3] Xin Lyu. Composition Theorems for Interactive Differential Privacy , 2022, NeurIPS.

[4] Bo Ji,et al. Differentially Private Linear Bandits with Partial Distributed Feedback , 2022, 2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt).

[5] Antonious M. Girgis,et al. Differentially Private Stochastic Linear Bandits: (Almost) for Free , 2022, ArXiv.

[6] Sayak Ray Chowdhury,et al. Distributed Differential Privacy in Multi-Armed Bandits , 2022, ICLR.

[7] Vikrant Singhal,et al. New Lower Bounds for Private Estimation and a Generalized Fingerprinting Lemma , 2022, NeurIPS.

[8] Leonardo Rocha,et al. Multi-Armed Bandits in Recommendation Systems: A survey of the state-of-the-art and future directions , 2022, Expert Syst. Appl..

[9] Salil Vadhan,et al. Concurrent Composition of Differential Privacy , 2021, IACR Cryptol. ePrint Arch..

[10] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .

[11] Christos Dimitrakakis,et al. Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost? , 2019, ArXiv.

[12] Or Sheffet,et al. An Optimal Private Stochastic-MAB Algorithm Based on an Optimal Private Stopping Rule , 2019, ICML.

[13] Ness Shroff,et al. Data Poisoning Attacks on Stochastic Bandits , 2019, ICML.

[14] Tony Jebara,et al. Thompson Sampling for Noncompliant Bandits , 2018, ArXiv.

[15] Lihong Li,et al. Adversarial Attacks on Stochastic Bandits , 2018, NeurIPS.

[16] Roshan Shariff,et al. Differentially Private Contextual Linear Bandits , 2018, NeurIPS.

[17] Seth Neel,et al. Mitigating Bias in Adaptive Data Gathering via Differential Privacy , 2018, ICML.

[18] Nathan Kallus,et al. Instrument-Armed Bandits , 2017, ALT.

[19] Ilya Mironov,et al. Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[20] Christos Dimitrakakis,et al. Achieving Privacy in the Adversarial Multi-Armed Bandit , 2017, AAAI.

[21] Thomas Steinke,et al. Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[22] Guy N. Rothblum,et al. Concentrated Differential Privacy , 2016, ArXiv.

[23] Christos Dimitrakakis,et al. Algorithms for Differentially Private Multi-Armed Bandits , 2015, AAAI.

[24] Nikita Mishra,et al. (Nearly) Optimal Differentially Private Stochastic Multi-Arm Bandits , 2015, UAI.

[25] Aaron Roth,et al. The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[26] Jonathan Ullman,et al. Fingerprinting codes and the price of approximate differential privacy , 2013, STOC.

[27] Pramod Viswanath,et al. The Composition Theorem for Differential Privacy , 2013, IEEE Transactions on Information Theory.

[28] Martin J. Wainwright,et al. Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[29] D. Bergemann,et al. Learning and Strategic Pricing , 1996 .

[30] B. K. Ghosh,et al. Sequential Tests of Statistical Hypotheses. , 1972 .

[31] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[32] S. Vadhan,et al. Concurrent Composition Theorems for all Standard Variants of Differential Privacy , 2022, ArXiv.

[33] N. Hegde,et al. Near-optimal Thompson sampling-based algorithms for differentially private stochastic bandits , 2022, UAI.