Generalized Linear Bandits with Local Differential Privacy

Contextual bandit algorithms are useful in personalized online decision-making. However, many applications such as personalized medicine and online advertising require the utilization of individual-specific information for effective learning, while user’s data should remain private from the server due to privacy concerns. This motivates the introduction of local differential privacy (LDP), a stringent notion in privacy, to contextual bandits. In this paper, we design LDP algorithms for stochastic generalized linear bandits to achieve the same regret bound as in non-privacy settings. Our main idea is to develop a stochastic gradient-based estimator and update mechanism to ensure LDP. We then exploit the flexibility of stochastic gradient descent (SGD), whose theoretical guarantee for bandit problems is rarely explored, in dealing with generalized linear bandits. We also develop an estimator and update mechanism based on Ordinary Least Square (OLS) for linear bandits. Finally, we conduct experiments with both simulation and real-world datasets to demonstrate the consistently superb performance of our algorithms under LDP constraints with reasonably small parameters (ε, δ) to ensure strong privacy protection.

[1]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[2]  J. Tropp User-Friendly Tail Bounds for Matrix Martingales , 2011 .

[3]  Yanjun Han,et al.  Sequential Batch Learning in Finite-Action Linear Contextual Bandits , 2020, ArXiv.

[4]  John Langford,et al.  A Contextual Bandit Bake-off , 2018, J. Mach. Learn. Res..

[5]  Cho-Jui Hsieh,et al.  An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling , 2021, AISTATS.

[6]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[7]  Robert D. Nowak,et al.  Scalable Generalized Linear Bandits: Online Computation and Hashing , 2017, NIPS.

[8]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[9]  Yining Wang,et al.  Privacy-Preserving Dynamic Personalized Pricing with Demand Learning , 2020, Manag. Sci..

[10]  Ayfer Özgür,et al.  Fisher Information Under Local Differential Privacy , 2020, IEEE Journal on Selected Areas in Information Theory.

[11]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[12]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[13]  Jasper Snoek,et al.  Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[14]  Martin J. Wainwright,et al.  Minimax Optimal Procedures for Locally Private Estimation , 2016, ArXiv.

[15]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[16]  Zhiwei Steven Wu,et al.  The Externalities of Exploration and How Data Diversity Helps Exploitation , 2018, COLT.

[17]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[18]  Xiaoyu Chen,et al.  (Locally) Differentially Private Combinatorial Semi-Bandits , 2020, ICML.

[19]  David Simchi-Levi,et al.  Hedging the Drift: Learning to Optimize under Non-Stationarity , 2019, Manag. Sci..

[20]  Christos Dimitrakakis,et al.  Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost? , 2019, ArXiv.

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  A. Zeevi,et al.  A Linear Response Bandit Problem , 2013 .

[23]  Mohsen Bayati,et al.  Online Decision-Making with High-Dimensional Covariates , 2015 .

[24]  Ness B. Shroff,et al.  Multi-Armed Bandits with Local Differential Privacy , 2020, ArXiv.

[25]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[26]  Ohad Shamir,et al.  Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.

[27]  Raef Bassily,et al.  Practical Locally Private Heavy Hitters , 2017, NIPS.

[28]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[29]  Zhengyuan Zhou,et al.  Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits , 2020, ArXiv.

[30]  Edoardo M. Airoldi,et al.  Statistical analysis of stochastic gradient methods for generalized linear models , 2014, ICML.

[31]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[32]  Jun Tang,et al.  Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12 , 2017, ArXiv.

[33]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[34]  J. Kalagnanam,et al.  Batched Learning in Generalized Linear Contextual Bandits With General Decision Sets , 2021, IEEE Control Systems Letters.

[35]  Khashayar Khosravi,et al.  Mostly Exploration-Free Algorithms for Contextual Bandits , 2017, Manag. Sci..

[36]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[37]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[38]  Robert Phillips,et al.  The Effectiveness of Field Price Discretion: Empirical Evidence from Auto Lending , 2015, Manag. Sci..

[39]  N. B. Keskin,et al.  Personalized Dynamic Pricing with Machine Learning: High Dimensional Features and Heterogeneous Elasticity , 2020 .

[40]  N. Bora Keskin,et al.  Personalized Dynamic Pricing with Machine Learning: High-Dimensional Features and Heterogeneous Elasticity , 2021, Manag. Sci..

[41]  Roshan Shariff,et al.  Differentially Private Contextual Linear Bandits , 2018, NeurIPS.

[42]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.