Bayesian Exploration with Heterogeneous Agents

It is common in recommendation systems that users both consume and produce information as they make strategic choices under uncertainty. While a social planner would balance “exploration” and “exploitation” using a multi-armed bandit algorithm, users' incentives may tilt this balance in favor of exploitation. We consider Bayesian Exploration: a simple model in which the recommendation system (the “principal”) controls the information flow to the users (the “agents”) and strives to incentivize exploration via information asymmetry. A single round of this model is a version of a well-known “Bayesian Persuasion game” from [24]. We allow heterogeneous users, relaxing a major assumption from prior work that users have the same preferences from one time step to another. The goal is now to learn the best personalized recommendations. One particular challenge is that it may be impossible to incentivize some of the user types to take some of the actions, no matter what the principal does or how much time she has. We consider several versions of the model, depending on whether and when the user types are reported to the principal, and design a near-optimal “recommendation policy” for each version. We also investigate how the model choice and the diversity of user types impact the set of actions that can possibly be “explored” by each type.

[1]  Nikhil R. Devanur,et al.  The price of truthfulness for pay-per-click auctions , 2009, EC '09.

[2]  Zhiwei Steven Wu,et al.  The Externalities of Exploration and How Data Diversity Helps Exploitation , 2018, COLT.

[3]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[4]  Umar Syed,et al.  Learning Prices for Repeated Auctions with Strategic Buyers , 2013, NIPS.

[5]  Yishay Mansour,et al.  Bayesian Exploration: Incentivizing Exploration in Bayesian Games , 2016, EC.

[6]  Moshe Babaioff,et al.  Truthful mechanisms with implicit payment computation , 2010, EC '10.

[7]  Patrick Hummel,et al.  Learning and incentives in user-generated content: multi-armed bandits with endogenous arms , 2013, ITCS '13.

[8]  Umar Syed,et al.  Repeated Contextual Auctions with Strategic Buyers , 2014, NIPS.

[9]  Carlos Riquelme,et al.  Human Interaction with Recommendation Systems , 2017, AISTATS.

[10]  Kostas Bimpikis,et al.  Crowdsourcing Exploration , 2018, Manag. Sci..

[11]  Bangrui Chen,et al.  Incentivizing Exploration by Heterogeneous Users , 2018, COLT.

[12]  Moshe Tennenholtz,et al.  Economic Recommendation Systems , 2015, ArXiv.

[13]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[14]  Yishay Mansour,et al.  Implementing the “Wisdom of the Crowd” , 2014, Journal of Political Economy.

[15]  Kevin D. Glazebrook,et al.  Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices , 2011 .

[16]  Jon M. Kleinberg,et al.  Incentivizing exploration , 2014, EC.

[17]  Khashayar Khosravi,et al.  Mostly Exploration-Free Algorithms for Contextual Bandits , 2017, Manag. Sci..

[18]  Sampath Kannan,et al.  A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem , 2018, NeurIPS.

[19]  Yeon-Koo Che,et al.  Optimal Design for Social Learning , 2015 .

[20]  Yishay Mansour,et al.  Competing Bandits: Learning Under Competition , 2017, ITCS.

[21]  Annie Liang,et al.  Overabundant Information and Learning Traps , 2018, EC.

[22]  Andreas Krause,et al.  Truthful incentives in crowdsourcing tasks using regret minimization mechanisms , 2013, WWW.

[23]  D. Bergemann,et al.  Dynamic Auctions: A Survey , 2010 .

[24]  Sampath Kannan,et al.  Fairness Incentives for Myopic Agents , 2017, EC.

[25]  Moshe Tennenholtz,et al.  Economic Recommendation Systems: One Page Abstract , 2016, EC.

[26]  S. Matthew Weinberg,et al.  Selling to a No-Regret Buyer , 2017, EC.

[27]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[28]  Moshe Babaioff,et al.  Characterizing truthful multi-armed bandit mechanisms: extended abstract , 2008, EC '09.

[29]  Yishay Mansour,et al.  Bayesian Incentive-Compatible Bandit Exploration , 2018 .

[30]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[31]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[32]  M. Cripps,et al.  Strategic Experimentation with Exponential Bandits , 2003 .

[33]  Imre Csiszár,et al.  Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .

[34]  Aleksandrs Slivkins,et al.  Adaptive contract design for crowdsourcing markets: bandit algorithms for repeated principal-agent problems , 2014, J. Artif. Intell. Res..

[35]  E. Glen Weyl,et al.  Descending Price Optimally Coordinates Search , 2016, EC.

[36]  Nicole Immorlica,et al.  Incentivizing Exploration with Unbiased Histories , 2018, ArXiv.

[37]  Annie Liang,et al.  Optimal and Myopic Information Acquisition , 2017, EC.

[38]  Aleksandrs Slivkins,et al.  Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems , 2016, J. Artif. Intell. Res..

[39]  Emir Kamenica,et al.  Bayesian Persuasion , 2009 .

[40]  Omar Besbes,et al.  Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms , 2009, Oper. Res..

[41]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..