Privacy Amplification via Shuffling for Linear Contextual Bandits

Contextual bandit algorithms are widely used in domains where it is desirable to provide a personalized service by leveraging contextual information, that may contain sensitive information that needs to be protected. Inspired by this scenario, we study the contextual linear bandit problem with differential privacy (DP) constraints. While the literature has focused on either centralized (joint DP) or local (local DP) privacy, we consider the shuffle model of privacy and we show that is possible to achieve a privacy/utility trade-off between JDP and LDP. By leveraging shuffling from privacy and batching from bandits, we present an algorithm with regret bound Õ(T /ε), while guaranteeing both central (joint) and local privacy. Our result shows that it is possible to obtain a trade-off between JDP and LDP by leveraging the shuffle model while preserving local privacy.

[1]  Oded Goldreich,et al.  The Foundations of Cryptography - Volume 2: Basic Applications , 2001 .

[2]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[3]  Akshay Krishnamurthy,et al.  Private Reinforcement Learning with PAC and Regret Guarantees , 2020, ICML.

[4]  Haim Kaplan,et al.  Differentially Private Multi-Armed Bandits in the Shuffle Model , 2021, NeurIPS.

[5]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[6]  Ness B. Shroff,et al.  Multi-Armed Bandits with Local Differential Privacy , 2020, ArXiv.

[7]  Roshan Shariff,et al.  Differentially Private Contextual Linear Bandits , 2018, NeurIPS.

[8]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[9]  Adam D. Smith,et al.  Distributed Differential Privacy via Shuffling , 2018, IACR Cryptol. ePrint Arch..

[10]  Vianney Perchet,et al.  Local Differentially Private Regret Minimization in Reinforcement Learning , 2020, ArXiv.

[11]  Christos Dimitrakakis,et al.  Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost? , 2019, ArXiv.

[12]  Yasin Abbasi-Yadkori,et al.  The Elliptical Potential Lemma Revisited , 2020, ArXiv.

[13]  Badih Ghazi,et al.  On Distributed Differential Privacy and Counting Distinct Elements , 2020, ITCS.

[14]  Moni Naor,et al.  Differential privacy under continual observation , 2010, STOC '10.

[15]  Yuandong Tian,et al.  Real-world Video Adaptation with Reinforcement Learning , 2019, ArXiv.

[16]  Úlfar Erlingsson,et al.  Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity , 2018, SODA.

[17]  Kai Zheng,et al.  Locally Differentially Private (Contextual) Bandits Learning , 2020, NeurIPS.

[18]  Borja Balle,et al.  The Privacy Blanket of the Shuffle Model , 2019, CRYPTO.

[19]  John M. Abowd,et al.  The U.S. Census Bureau Adopts Differential Privacy , 2018, KDD.

[20]  Yanjun Han,et al.  Sequential Batch Learning in Finite-Action Linear Contextual Bandits , 2020, ArXiv.

[21]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[22]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[23]  Kunal Talwar,et al.  Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling , 2020, 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS).

[24]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[25]  Haoran Wang,et al.  Robo-Advising: Enhancing Investment with Inverse Optimization and Deep Reinforcement Learning , 2021, 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA).

[26]  Úlfar Erlingsson,et al.  Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation , 2020, ArXiv.

[27]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.