Optimal Coordinated Planning Amongst Self-Interested Agents with Private State

Consider a multi-agent system in a dynamic and uncertain environment. Each agent's local decision problem is modeled as a Markov decision process (MDP) and agents must coordinate on a joint action in each period, which provides a reward to each agent and causes local state transitions. A social planner knows the model of every agent's MDP and wants to implement the optimal joint policy, but agents are self-interested and have private local state. We provide an incentive-compatible mechanism for eliciting state information that achieves the optimal joint plan in a Markov perfect equilibrium of the induced stochastic game. In the special case in which local problems are Markov chains and agents compete to take a single action in each period, we leverage Gittins allocation indices to provide an efficient factored algorithm and distribute computation of the optimal policy among the agents. Distributed, optimal coordinated learning in a multi-agent variant of the multi-armed bandit problem is obtained as a special case.

[1]  Theodore Groves,et al.  Incentives in Teams , 1973 .

[2]  Michael N. Katehakis,et al.  The Multi-Armed Bandit Problem: Decomposition and Computation , 1987, Math. Oper. Res..

[3]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[4]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[5]  小島 健司 Drew Fudenberg and Jean Tirole, Game Theory , 1992 .

[6]  David M. Kreps,et al.  Learning Mixed Equilibria , 1993 .

[7]  G. Myles Journal of Economic Theory: J.-M. Grandmont, 1992, Transformations of the commodity space, behavioural heterogeneity, and the aggregation problem 57, 1-35 , 1993 .

[8]  E. Kalai,et al.  Rational Learning Leads to Nash Equilibrium , 1993 .

[9]  Electronic Commerce , 1996, Lecture Notes in Computer Science.

[10]  Noam Nisan,et al.  Competitive analysis of incentive compatible on-line auctions , 2000, EC '00.

[11]  Matthew O. Jackson,et al.  Voluntary Implementation , 2001, J. Econ. Theory.

[12]  Eric Maskin,et al.  Markov Perfect Equilibrium: I. Observable Actions , 2001, J. Econ. Theory.

[13]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[14]  Ronen I. Brafman,et al.  Efficient learning equilibrium , 2004, Artificial Intelligence.

[15]  W. N Konings,et al.  The encyclopedia of life support systems , 2003 .

[16]  Matthew O. Jackson,et al.  Corrigendum to "Voluntary implementation": J. Econ. Theor. 98 (2001) 1-25 , 2003, J. Econ. Theory.

[17]  Eric J. Friedman,et al.  Pricing WiFi at Starbucks: issues in online mechanism design , 2003, EC '03.

[18]  David C. Parkes,et al.  An MDP-Based Approach to Online Mechanism Design , 2003, NIPS.

[19]  Mohammad Taghi Hajiaghayi,et al.  Adaptive limited-supply online auctions , 2004, EC '04.

[20]  Jérémie Gallien,et al.  Sloan School of Management Working Paper 4268-02 December 2002 Dynamic Mechanism Design for Online Commerce , 2002 .