Privacy-Preserving Bandits

Contextual bandit algorithms~(CBAs) often rely on personal data to provide recommendations. Centralized CBA agents utilize potentially sensitive data from recent interactions to provide personalization to end-users. Keeping the sensitive data locally, by running a local agent on the user's device, protects the user's privacy, however, the agent requires longer to produce useful recommendations, as it does not leverage feedback from other users. This paper proposes a technique we call Privacy-Preserving Bandits (P2B); a system that updates local agents by collecting feedback from other local agents in a differentially-private manner. Comparisons of our proposed approach with a non-private, as well as a fully-private (local) system, show competitive performance on both synthetic benchmarks and real-world data. Specifically, we observed only a decrease of 2.6% and 3.6% in multi-label classification accuracy, and a CTR increase of 0.0025 in online advertising for a privacy budget $\epsilon \approx 0.693$. These results suggest P2B is an effective approach to challenges arising in on-device privacy-preserving personalization.

[1]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[2]  Moti Yung,et al.  Differentially-Private "Draw and Discard" Machine Learning , 2018, ArXiv.

[3]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[4]  Úlfar Erlingsson,et al.  Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation , 2020, ArXiv.

[5]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[6]  Reza Shokri,et al.  Machine Learning with Membership Privacy using Adversarial Regularization , 2018, CCS.

[7]  Emilie Kaufmann,et al.  Corrupt Bandits for Preserving Local Privacy , 2017, ALT.

[8]  Alistair A. Young,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2017, MICCAI 2017.

[9]  Danfeng Zhang,et al.  Guidelines for Implementing and Auditing Differentially Private Systems , 2020, ArXiv.

[10]  Elad Yom-Tov,et al.  Recommendations meet web browsing: enhancing collaborative filtering using internet browsing logs , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[11]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[12]  Johannes Gehrke,et al.  Crowd-Blending Privacy , 2012, IACR Cryptol. ePrint Arch..

[13]  Reza Gharibi,et al.  Gamified Incentives: A Badge Recommendation Model to Improve User Engagement in Social Networking Websites , 2017 .

[14]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[15]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[16]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[17]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[18]  A. Benjamin,et al.  Proofs that Really Count: The Art of Combinatorial Proof , 2003 .

[19]  Johannes Gehrke,et al.  Towards Privacy for Social Networks: A Zero-Knowledge Based Definition of Privacy , 2011, TCC.

[20]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[21]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[22]  Armen Aghasaryan,et al.  On the Use of LSH for Privacy Preserving Personalization , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[23]  Christos Dimitrakakis,et al.  Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost? , 2019, ArXiv.

[24]  Christos Dimitrakakis,et al.  Achieving Privacy in the Adversarial Multi-Armed Bandit , 2017, AAAI.

[25]  Roshan Shariff,et al.  Differentially Private Contextual Linear Bandits , 2018, NeurIPS.

[26]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[27]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[28]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[29]  A.N. Srivastava,et al.  Discovering recurring anomalies in text reports regarding complex space systems , 2005, 2005 IEEE Aerospace Conference.