Learning algorithms for online principal-agent problems (and selling goods online)

In a principal-agent problem, a principal seeks to motivate an agent to take a certain action beneficial to the principal, while spending as little as possible on the reward. This is complicated by the fact that the principal does not know the agent's utility function (or type). We study the online setting where at each round, the principal encounters a new agent, and the principal sets the rewards anew. At the end of each round, the principal only finds out the action that the agent took, but not his type. The principal must learn how to set the rewards optimally. We show that this setting generalizes the setting of selling a digital good online.We study and experimentally compare three main approaches to this problem. First, we show how to apply a standard bandit algorithm to this setting. Second, for the case where the distribution of agent types is fixed (but unknown to the principal), we introduce a new gradient ascent algorithm. Third, for the case where the distribution of agents' types is fixed, and the principal has a prior belief (distribution) over a limited class of type distributions, we study a Bayesian approach.

[1]  Vijay Kumar,et al.  Online learning in online auctions , 2003, SODA '03.

[2]  Moshe Babaioff,et al.  Mechanism Design for Single-Value Domains , 2005, AAAI.

[3]  Robert Kleinberg The Value of Knowing a Demand Curve : Bounds on Regret for On-line Posted-Price Auctions PRELIMINARY VERSION – DO NOT DISTRIBUTE , 2004 .

[4]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[5]  David C. Parkes,et al.  GROWRANGE: Anytime VCG-Based Mechanisms , 2004, AAAI.

[6]  Rica Gonen,et al.  Negotiation-range mechanisms: exploring the limits of truthful efficient markets , 2004, EC '04.

[7]  A. Mas-Colell,et al.  Microeconomic Theory , 1995 .

[8]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[9]  Vincent Conitzer,et al.  Self-interested automated mechanism design and implications for optimal combinatorial auctions , 2004, EC '04.

[10]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[11]  Abhi Shelat,et al.  Searching for Stable Mechanisms: Automated Design for Imperfect Players , 2004, AAAI.

[12]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[13]  Ryan Porter,et al.  Mechanism design for online real-time scheduling , 2004, EC '04.

[14]  Ilya Segal,et al.  Solutions manual for Microeconomic theory : Mas-Colell, Whinston and Green , 1997 .

[15]  Moshe Tennenholtz,et al.  Sequential-Simultaneous Information Elicitation in Multi-Agent Systems , 2005, IJCAI.

[16]  Moshe Tennenholtz,et al.  Sequential Information Elicitation in Multi-Agent Systems , 2004, UAI.

[17]  Nimrod Megiddo,et al.  How to Combine Expert (and Novice) Advice when Actions Impact the Environment? , 2003, NIPS.

[18]  Felix Wu,et al.  Incentive-compatible online auctions for digital goods , 2002, SODA '02.