论文信息 - An Optimal Private Stochastic-MAB Algorithm Based on an Optimal Private Stopping Rule - 字舞流文

An Optimal Private Stochastic-MAB Algorithm Based on an Optimal Private Stopping Rule

We present a provably optimal differentially private algorithm for the stochastic multi-arm bandit problem, as opposed to the private analogue of the UCB-algorithm [Mishra and Thakurta, 2015; Tossou and Dimitrakakis, 2016] which doesn't meet the recently discovered lower-bound of $\Omega \left(\frac{K\log(T)}{\epsilon} \right)$ [Shariff and Sheffet, 2018]. Our construction is based on a different algorithm, Successive Elimination [Even-Dar et al. 2002], that repeatedly pulls all remaining arms until an arm is found to be suboptimal and is then eliminated. In order to devise a private analogue of Successive Elimination we visit the problem of private stopping rule, that takes as input a stream of i.i.d samples from an unknown distribution and returns a multiplicative $(1 \pm \alpha)$-approximation of the distribution's mean, and prove the optimality of our private stopping rule. We then present the private Successive Elimination algorithm which meets both the non-private lower bound [Lai and Robbins, 1985] and the above-mentioned private lower bound. We also compare empirically the performance of our algorithm with the private UCB algorithm.

Or Sheffet | Touqir Sajed | Or Sheffet | Touqir Sajed

[1] Peter S. Fader,et al. Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments , 2016, Mark. Sci..

[2] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .

[3] Osamu Watanabe,et al. Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms , 1999, Data Mining and Knowledge Discovery.

[4] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[5] Sampath Kannan,et al. A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem , 2018, NeurIPS.

[6] Nikita Mishra,et al. (Nearly) Optimal Differentially Private Stochastic Multi-Arm Bandits , 2015, UAI.

[7] Cynthia Dwork,et al. Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[8] Csaba Szepesvári,et al. Empirical Bernstein stopping , 2008, ICML '08.

[9] Roshan Shariff,et al. Differentially Private Contextual Linear Bandits , 2018, NeurIPS.

[10] Christos Dimitrakakis,et al. Algorithms for Differentially Private Multi-Armed Bandits , 2015, AAAI.

[11] Elaine Shi,et al. Private and Continual Release of Statistics , 2010, TSEC.

[12] Eric T. Bradlow,et al. Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments , 2016 .

[13] Stéphane Caron,et al. Mixing bandits: a recipe for improved cold-start recommendations in a social network , 2013, SNAKDD '13.

[14] Vishesh Karwa,et al. Finite Sample Differentially Private Confidence Intervals , 2017, ITCS.

[15] Vianney Perchet,et al. Bounded regret in stochastic multi-armed bandits , 2013, COLT.

[16] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[17] Nando de Freitas,et al. Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[18] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[19] Thomas Steinke,et al. Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[20] Zheng Wen,et al. Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[21] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[22] Martha White,et al. High-confidence error estimates for learned value functions , 2018, UAI.

[23] Richard M. Karp,et al. An Optimal Algorithm for Monte Carlo Estimation , 2000, SIAM J. Comput..

[24] Moni Naor,et al. Differential privacy under continual observation , 2010, STOC '10.

[25] Shie Mannor,et al. PAC Bounds for Multi-armed Bandit and Markov Decision Processes , 2002, COLT.

[26] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[27] Adam D. Smith,et al. (Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings , 2013, NIPS.