Secure Cumulative Reward Maximization in Linear Stochastic Bandits

The linear stochastic multi-armed bandit is a sequential learning setting, where, at each round, a learner chooses an arm and receives a stochastic reward based on an unknown linear function of the chosen arm. The goal is to collect as much reward as possible. Linear bandits have popular applications such as online recommendation based on user preferences, where obtaining a high reward means recommending an item with high expected rating. We address the security concerns that occur when outsourcing the data and the cumulative reward maximization algorithm to an honest-but-curious cloud. We propose LinUCB-DS, a distributed and secure protocol that achieves the same cumulative reward as the standard LinUCB algorithm, without disclosing to the cloud the linear function used to draw arm rewards. We formally prove the complexity and security properties of LinUCB-DS. We also show that LinUCB-DS can be easily adapted to secure the SpectralUCB algorithm, which improves LinUCB for a class of linear bandits. We show the feasibility of our protocols via a proof-of-concept experimental study using the MovieLens movie recommendation dataset.

[1]  Pascal Paillier,et al.  Fast Homomorphic Evaluation of Deep Discretized Neural Networks , 2018, IACR Cryptol. ePrint Arch..

[2]  Olga Ohrimenko,et al.  Sorting and Searching Behind the Curtain , 2015, Financial Cryptography.

[3]  Emilie Kaufmann,et al.  Corrupt Bandits for Preserving Local Privacy , 2017, ALT.

[4]  Nikita Mishra,et al.  (Nearly) Optimal Differentially Private Stochastic Multi-Arm Bandits , 2015, UAI.

[5]  Pascal Lafourcade,et al.  Secure Best Arm Identification in Multi-armed Bandits , 2019, ISPEC.

[6]  Rémi Munos,et al.  Spectral Bandits for Smooth Graph Functions , 2014, ICML.

[7]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[8]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[9]  Roshan Shariff,et al.  Differentially Private Contextual Linear Bandits , 2018, NeurIPS.

[10]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[11]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[12]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[13]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[14]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[15]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.