Learning and incentives in user-generated content: multi-armed bandits with endogenous arms

Motivated by the problem of learning the qualities of user-generated content on the Web, we study a multi-armed bandit problem where the number and success probabilities of the arms of the bandit are endogenously determined by strategic agents in response to the incentives provided by the learning algorithm. We model the contributors of user-generated content as attention-motivated agents who derive benefit when their contribution is displayed, and have a cost to quality, where a contribution's quality is the probability of its receiving a positive viewer vote. Agents strategically choose whether and what quality contribution to produce in response to the algorithm that decides how to display contributions. The algorithm, which would like to eventually only display the highest quality contributions, can only learn a contribution's quality from the viewer votes the contribution receives when displayed. The problem of inferring the relative qualities of contributions using viewer feedback, to optimize for overall viewer satisfaction over time, can then be modeled as the classic multi-armed bandit problem, except that the arms available to the bandit and therefore the achievable regret are endogenously determined by strategic agents --- a good algorithm for this setting must not only quickly identify the best contributions, but also incentivize high-quality contributions to choose amongst in the first place. We first analyze the well-known UCB algorithm Ma [Auer et al. 2002] as a mechanism in this setting, where the total number of potential contributors or arms, K, can grow with the total number of viewers or available periods, T, and the maximum possible success probability of an arm, γ, may be bounded away from 1 to model malicious or error-prone viewers in the audience. We first show that while Ma can incentivize high-quality arms and achieve strong sublinear equilibrium regret when K(T) does not grow too quickly with T, it incentivizes very low quality contributions when K(T) scales proportionally with T. We then show that modifying the UCB mechanism to explore a randomly chosen restricted subset of √{T} arms provides excellent incentive properties --- this modified mechanism achieves strong sublinear regret, which is the regret measured against the maximum achievable quality γ, in every equilibrium, for all ranges of K(T) ≤ T, for all possible values of the audience parameter $\gamma$.

[1]  Moshe Babaioff,et al.  Truthful mechanisms with implicit payment computation , 2010, EC '10.

[2]  Johannes Gerd Becker,et al.  On the existence of symmetric mixed strategy equilibria , 2006 .

[3]  Patrick Hummel,et al.  Implementing optimal outcomes in social computing: a game-theoretic approach , 2012, WWW.

[4]  Rémi Munos,et al.  Algorithms for Infinitely Many-Armed Bandits , 2008, NIPS.

[5]  Nikhil R. Devanur,et al.  The price of truthfulness for pay-per-click auctions , 2009, EC '09.

[6]  Felix Schlenk,et al.  Proof of Theorem 3 , 2005 .

[7]  S. Gelly,et al.  Anytime many-armed bandits , 2007 .

[8]  David C. Parkes,et al.  Designing incentives for online question and answer forums , 2009, EC '09.

[9]  David C. Parkes,et al.  The role of game theory in human computation systems , 2009, HCOMP '09.

[10]  David C. Parkes,et al.  A game-theoretic analysis of the ESP game , 2013, TEAC.

[11]  R. Preston McAfee,et al.  Incentivizing high-quality user-generated content , 2011, WWW.

[12]  Moshe Babaioff,et al.  Characterizing truthful multi-armed bandit mechanisms: extended abstract , 2009, EC '09.

[13]  Jia Yuan Yu,et al.  Mean field equilibria of multi armed bandit games , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[14]  R. Varga,et al.  Proof of Theorem 4 , 1983 .

[15]  Patrick Hummel,et al.  A game-theoretic analysis of rank-order mechanisms for user-generated content , 2011, EC '11.

[16]  Robert D. Kleinberg,et al.  Online decision problems with large strategy sets , 2005 .

[17]  Robert W. Chen,et al.  Bandit problems with infinitely many arms , 1997 .

[18]  Nicola Gatti,et al.  Truthful learning mechanisms for multi-slot sponsored search auctions with externalities , 2012, Artif. Intell..

[19]  Moshe Babaioff,et al.  Characterizing truthful multi-armed bandit mechanisms: extended abstract , 2008, EC '09.

[20]  D. Bergemann,et al.  Learning and Strategic Pricing , 1996 .

[21]  W. Viscusi,et al.  Job Hazards and Worker Quit Rates: An Analysis of Adaptive Worker Behavior , 1979 .

[22]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[23]  B. McCall,et al.  A Sequential Study of Migration and Job Search , 1987, Journal of Labor Economics.

[24]  Rica Gonen,et al.  An incentive-compatible multi-armed bandit mechanism , 2007, PODC '07.

[25]  Alessandro Lazaric,et al.  A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities , 2012, EC '12.

[26]  Sham M. Kakade,et al.  An Optimal Dynamic Mechanism for Multi-Armed Bandit Processes , 2010, ArXiv.