No-regret learning and a mechanism for distributed multiagent planning

We develop a novel mechanism for coordinated, distributed multiagent planning. We consider problems stated as a collection of single-agent planning problems coupled by common soft constraints on resource consumption. (Resources may be real or fictitious, the latter introduced as a tool for factoring the problem). A key idea is to recast the distributed planning problem as learning in a repeated game between the original agents and a newly introduced group of adversarial agents who influence prices for the resources. The adversarial agents benefit from arbitrage: that is, their incentive is to uncover violations of the resource usage constraints and, by selfishly adjusting prices, encourage the original agents to avoid plans that cause such violations. If all agents employ no-regret learning algorithms in the course of this repeated interaction, we are able to show that our mechanism can achieve design goals such as social optimality (efficiency), budget balance, and Nash-equilibrium convergence to within an error which approaches zero as the agents gain experience. In particular, the agents' average plans converge to a socially optimal solution for the original planning task. We present experiments in a simulated network routing domain demonstrating our method's ability to reliably generate sound plans.

[1]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[2]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[3]  Manuel Blum,et al.  Coin Flipping by Telephone. , 1981, CRYPTO 1981.

[4]  Shimon Even,et al.  A protocol for signing contracts , 1983, SIGA.

[5]  Manfred J. Holler,et al.  Einführung in die Spieltheorie , 1993 .

[6]  Michael P. Wellman A Market-Oriented Programming Environment and its Application to Distributed Multicommodity Flow Problems , 1993, J. Artif. Intell. Res..

[7]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[8]  Anthony Stentz,et al.  A Free Market Architecture for Coordinating Multiple Robots , 1999 .

[9]  Geoffrey J. Gordon Regret bounds for prediction problems , 1999, COLT '99.

[10]  Gerhard Weiss,et al.  Multiagent Systems , 1999 .

[11]  Joan Feigenbaum,et al.  Sharing the Cost of Multicast Transmissions , 2001, J. Comput. Syst. Sci..

[12]  Maja J. Mataric,et al.  Sold!: auction methods for multirobot coordination , 2002, IEEE Trans. Robotics Autom..

[13]  Geoffrey J. Gordon,et al.  Distributed Planning in Hierarchical Factored MDPs , 2002, UAI.

[14]  Adam Meyerson,et al.  Online oblivious routing , 2003, SPAA '03.

[15]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[16]  R. Michael Young,et al.  Experiments with Planning and Markets in Multiagent Systems , 2004, AAMAS 2004.

[17]  Walter Schlee Einführung in die Spieltheorie , 2004 .

[18]  Moni Naor,et al.  Bit commitment using pseudorandomness , 1989, Journal of Cryptology.

[19]  R. Michael Young,et al.  Experiments with planning and markets in multiagent systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[20]  A. Stentz,et al.  Market-based Approaches for Coordination of Multi-robot Teams at Different Granularities of Interaction , 2004 .

[21]  Tim Roughgarden,et al.  Selfish routing and the price of anarchy , 2005 .

[22]  Joan Feigenbaum,et al.  A BGP-based mechanism for lowest-cost routing , 2002, PODC '02.

[23]  Evangelos Markakis,et al.  Auction-Based Multi-Robot Routing , 2005, Robotics: Science and Systems.

[24]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[25]  Agostino Poggi,et al.  Multiagent Systems , 2006, Intelligenza Artificiale.

[26]  Avrim Blum,et al.  Routing without regret: on convergence to nash equilibria of regret-minimizing algorithms in routing games , 2006, PODC '06.

[27]  Ryszard Kowalczyk,et al.  Reinforcement learning with utility-aware agents for market-based resource allocation , 2007, AAMAS '07.

[28]  Gábor Lugosi,et al.  Learning correlated equilibria in games with compact sets of strategies , 2007, Games Econ. Behav..

[29]  Mohammad Taghi Hajiaghayi,et al.  Regret minimization and the price of total anarchy , 2008, STOC.

[30]  Geoffrey J. Gordon,et al.  No-regret learning in convex games , 2008, ICML '08.