Differentially Private Multi-Armed Bandits in the Shuffle Model

We give an (ε, δ)-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model with a distribution-dependent regret of O ((∑ a∈[k]:∆a>0 log T ∆a ) + k √ log 1 δ log T ε ) , and a distribution-independent regret of O (√ kT log T + k √ log 1 δ log T ε ) , where T is the number of rounds, ∆a is the suboptimality gap of the arm a, and k is the total number of arms. Our upper bound almost matches the regret of the best known algorithms for the centralized model, and significantly outperforms the best known algorithm in the local model.

[1]  Uri Stemmer,et al.  Heavy Hitters and the Structure of Local Privacy , 2017, PODS.

[2]  Úlfar Erlingsson,et al.  Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation , 2020, ArXiv.

[3]  Abhimanyu Dubey,et al.  Differentially-Private Federated Linear Bandits , 2020, NeurIPS.

[4]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[5]  Amos Beimel,et al.  On the Round Complexity of the Shuffle Model , 2020, IACR Cryptol. ePrint Arch..

[6]  Pravesh Kothari,et al.  25th Annual Conference on Learning Theory Differentially Private Online Learning , 2022 .

[7]  Cong Shen,et al.  Federated Multi-Armed Bandits , 2021, AAAI.

[8]  Adrià Gascón,et al.  Private Summation in the Multi-Message Shuffle Model , 2020, CCS.

[9]  Roshan Shariff,et al.  Differentially Private Contextual Linear Bandits , 2018, NeurIPS.

[10]  Eran Omri,et al.  Distributed Private Data Analysis: On Simultaneously Solving How and What , 2008, CRYPTO.

[11]  Borja Balle,et al.  The Privacy Blanket of the Shuffle Model , 2019, CRYPTO.

[12]  Amin Karbasi,et al.  Regret Bounds for Batched Bandits , 2019, AAAI.

[13]  Adam D. Smith,et al.  Distributed Differential Privacy via Shuffling , 2018, IACR Cryptol. ePrint Arch..

[14]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[15]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[16]  Christos Dimitrakakis,et al.  Algorithms for Differentially Private Multi-Armed Bandits , 2015, AAAI.

[17]  Or Sheffet,et al.  An Optimal Private Stochastic-MAB Algorithm Based on an Optimal Private Stopping Rule , 2019, ICML.

[18]  Elaine Shi,et al.  Optimal Lower Bound for Differentially Private Multi-party Aggregation , 2012, ESA.

[19]  Úlfar Erlingsson,et al.  Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity , 2018, SODA.

[20]  Adam D. Smith,et al.  (Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings , 2013, NIPS.

[21]  Nikita Mishra,et al.  (Nearly) Optimal Differentially Private Stochastic Multi-Arm Bandits , 2015, UAI.

[22]  Ness B. Shroff,et al.  Multi-Armed Bandits with Local Differential Privacy , 2020, ArXiv.

[23]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[24]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[25]  Yanjun Han,et al.  Batched Multi-armed Bandits Problem , 2019, NeurIPS.

[26]  Badih Ghazi,et al.  Scalable and Differentially Private Distributed Aggregation in the Shuffled Model , 2019, ArXiv.

[27]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[28]  Borja Balle,et al.  Differentially Private Summation with Multi-Message Shuffling , 2019, ArXiv.