Incentivising Exploration and Recommendations for Contextual Bandits with Payments

We propose a contextual bandit based model to capture the learning and social welfare goals of a web platform in the presence of myopic users. By using payments to incentivize these agents to explore different items/recommendations, we show how the platform can learn the inherent attributes of items and achieve a sublinear regret while maximizing cumulative social welfare. We also calculate theoretical bounds on the cumulative costs of incentivization to the platform. Unlike previous works in this domain, we consider contexts to be completely adversarial, and the behavior of the adversary is unknown to the platform. Our approach can improve various engagement metrics of users on e-commerce stores, recommendation engines and matching platforms.

[1]  Nicole Immorlica,et al.  Incentivizing Exploration with Selective Data Disclosure , 2018, EC.

[2]  John Langford,et al.  Practical Evaluation and Optimization of Contextual Bandit Algorithms , 2018, ArXiv.

[3]  Aart van Halteren,et al.  Toward a persuasive mobile application to reduce sedentary behavior , 2013, Personal and Ubiquitous Computing.

[4]  Yishay Mansour,et al.  Bayesian Exploration: Incentivizing Exploration in Bayesian Games , 2016, EC.

[5]  Yishay Mansour,et al.  Optimal Algorithm for Bayesian Incentive-Compatible , 2018, ArXiv.

[6]  Jon M. Kleinberg,et al.  Incentivizing exploration , 2014, EC.

[7]  Li Han,et al.  Incentivizing Exploration with Heterogeneous Value of Money , 2015, WINE.

[8]  Nicole Immorlica,et al.  Incentivizing Exploration with Unbiased Histories , 2018, ArXiv.

[9]  Sampath Kannan,et al.  Fairness Incentives for Myopic Agents , 2017, EC.

[10]  Yishay Mansour,et al.  Optimal Algorithm for Bayesian Incentive-Compatible Exploration , 2018, EC.

[11]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[12]  Khashayar Khosravi,et al.  Mostly Exploration-Free Algorithms for Contextual Bandits , 2017, Manag. Sci..

[13]  Sampath Kannan,et al.  A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem , 2018, NeurIPS.

[14]  Siwei Wang,et al.  Multi-armed Bandits with Compensation , 2018, NeurIPS.

[15]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[16]  Bangrui Chen,et al.  Incentivizing Exploration by Heterogeneous Users , 2018, COLT.

[17]  Nicole Immorlica,et al.  Bayesian Exploration with Heterogeneous Agents , 2019, WWW.

[18]  Jasper Snoek,et al.  Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.

[19]  Yishay Mansour,et al.  Bayesian Incentive-Compatible Bandit Exploration , 2018 .

[20]  John Langford,et al.  A Contextual Bandit Bake-off , 2018, J. Mach. Learn. Res..