Adaptive contract design for crowdsourcing markets: bandit algorithms for repeated principal-agent problems

Crowdsourcing markets have emerged as a popular platform for matching available workers with tasks to complete. The payment for a particular task is typically set by the task's requester, and may be adjusted based on the quality of the completed work, for example, through the use of 'bonus' payments. In this paper, we study the requester's problem of dynamically adjusting quality-contingent payments for tasks. We consider a multi-round version of the well-known principal-agent model, whereby in each round a worker makes a strategic choice of the effort level which is not directly observable by the requester. In particular, our formulation significantly generalizes the budget-free online task pricing problems studied in prior work. We treat this problem as a multi-armed bandit problem, with each 'arm' representing a potential contract. To cope with the large (and in fact, infinite) number of arms, we propose a new algorithm, AgnosticZooming, which discretizes the contract space into a finite number of regions, effectively treating each region as a single arm. This discretization is adaptively refined, so that more promising regions of the contract space are eventually discretized more finely. We provide a full analysis of this algorithm, showing that it achieves regret sublinear in the time horizon and substantially improves over non-adaptive discretization (which is the only competing approach in the literature).

[1]  Robert D. Kleinberg,et al.  Learning on a budget: posted price mechanisms for online procurement , 2012, EC '12.

[2]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[3]  Csaba Szepesvári,et al.  Partial Monitoring - Classification, Regret Bounds, and Algorithms , 2014, Math. Oper. Res..

[4]  Jean-Yves Audibert,et al.  Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[5]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[6]  Yuliy Sannikov Advances in Economics and Econometrics: Contracts: The Theory of Dynamic Principal–Agent Relationships and the Continuous-Time Approach , 2013 .

[7]  Harikesh S. Nair,et al.  Homogeneous Contracts for Heterogeneous Agents: Aligning Sales Force Composition and Compensation , 2016 .

[8]  Jia Yuan Yu,et al.  Lipschitz Bandits without the Lipschitz Constant , 2011, ALT.

[9]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[10]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[11]  Mihaela van der Schaar,et al.  Towards Social Norm Design for Crowdsourcing Markets , 2012, HCOMP@AAAI.

[12]  Nikhil R. Devanur,et al.  Contextual Bandits with Global Constraints and Objective , 2015, ArXiv.

[13]  Aleksandrs Slivkins,et al.  Multi-armed bandits on implicit metric spaces , 2011, NIPS.

[14]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[15]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[16]  SlivkinsAleksandrs,et al.  Adaptive contract design for crowdsourcing markets , 2016 .

[17]  Yu-An Sun,et al.  The Effects of Performance-Contingent Financial Incentives in Online Labor Markets , 2013, AAAI.

[18]  Moshe Babaioff,et al.  Dynamic Pricing with Limited Supply , 2011, ACM Trans. Economics and Comput..

[19]  Csaba Szepesvári,et al.  Online Optimization in X-Armed Bandits , 2008, NIPS.

[20]  Filip Radlinski,et al.  Ranked bandits in metric spaces: learning diverse rankings over large document collections , 2013, J. Mach. Learn. Res..

[21]  Luciano Messori The Theory of Incentives I: The Principal-Agent Model , 2013 .

[22]  Vijay Kumar,et al.  Online learning in online auctions , 2003, SODA '03.

[23]  Vincent Conitzer,et al.  Learning algorithms for online principal-agent problems (and selling goods online) , 2006, ICML.

[24]  Aleksandrs Slivkins,et al.  Incentivizing high quality crowdwork , 2015, SECO.

[25]  Yuliy Sannikov A Continuous-Time Version of the Principal-Agent , 2005 .

[26]  David C. Parkes,et al.  Designing incentives for online question and answer forums , 2009, EC '09.

[27]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[28]  A. V. den Boer,et al.  Dynamic Pricing and Learning: Historical Origins, Current Research, and New Directions , 2013 .

[29]  Omar Besbes,et al.  Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms , 2009, Oper. Res..

[30]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[31]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[32]  Yaron Singer,et al.  Pricing mechanisms for crowdsourcing markets , 2013, WWW.

[33]  Patrick Hummel,et al.  A game-theoretic analysis of rank-order mechanisms for user-generated content , 2011, EC '11.

[34]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[35]  Zizhuo Wang,et al.  Close the Gaps: A Learning-While-Doing Algorithm for Single-Product Revenue Management Problems , 2014, Oper. Res..

[36]  Moshe Babaioff,et al.  Combinatorial agency , 2006, EC '06.

[37]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[38]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[39]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[40]  Lydia B. Chilton,et al.  The labor economics of paid crowdsourcing , 2010, EC '10.

[41]  Patrick Hummel,et al.  Learning and incentives in user-generated content: multi-armed bandits with endogenous arms , 2013, ITCS '13.

[42]  Duncan J. Watts,et al.  Financial incentives and the "performance of crowds" , 2009, HCOMP '09.

[43]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[44]  Troy Kravitz,et al.  Incentives for Spot Market Labor When Output is Unverifiable∗ , 2013 .

[45]  Deepayan Chakrabarti,et al.  Bandits for Taxonomies: A Model-based Approach , 2007, SDM.

[46]  Csaba Szepesvári,et al.  Toward a classification of finite partial-monitoring games , 2010, Theor. Comput. Sci..

[47]  Filip Radlinski,et al.  Learning optimally diverse rankings over large document collections , 2010, ICML.

[48]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[49]  N. Williams On Dynamic Principal-Agent Problems in Continuous Time , 2004 .

[50]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[51]  Aleksandrs Slivkins,et al.  Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[52]  Andreas Krause,et al.  Truthful incentives in crowdsourcing tasks using regret minimization mechanisms , 2013, WWW.

[53]  Armando Levy,et al.  Optimal linear contracts with heterogeneous agents , 2002 .

[54]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[55]  Omar Besbes,et al.  Blind Network Revenue Management , 2011, Oper. Res..

[56]  Mihaela van der Schaar,et al.  Reputation-based incentive protocols in crowdsourcing applications , 2011, 2012 Proceedings IEEE INFOCOM.

[57]  R. Preston McAfee,et al.  Incentivizing high-quality user-generated content , 2011, WWW.

[58]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[59]  Nikhil R. Devanur,et al.  Bandits with concave rewards and convex knapsacks , 2014, EC.

[60]  John Langford,et al.  Resourceful Contextual Bandits , 2014, COLT.

[61]  Christopher G. Harris You're Hired! An Examination of Crowdsourcing Incentive Models in Human Resource Tasks , 2011 .

[62]  Adam D. Bull,et al.  Adaptive-treed bandits , 2013, 1302.2489.