Flexible approximation of structured interactions in decentralized Markov decision processes

Our work is motivated by cooperative planning problems where agents can affect each others’ transitions and rewards, and so benefit from coordinating their actions, but in doing so must account for durational uncertainty in these actions. To reason about this uncertainty efficiently, agents can employ temporal decoupling (a paradigm that has been explored in a variety of restricted contexts [2, 3]) to constrain interactions to occur by selected time points, representing the uncertain occurrence for each time point with a probabilistic promise [5]. Here we summarize a reformulation of Becker’s Event-driven DEC-MDP problems [1] that uses commitment models to exploit temporal structure. We argue that, in addition to representing optimal solutions, our approach enables more efficient, scalable computation of approximate solutions and a natural flexibility by which interactions can be modeled with more or less detail.

[1]  Victor R. Lesser,et al.  Incorporating Uncertainty in Agent Commitments , 1999, ATAL.

[2]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[3]  Victor R. Lesser,et al.  Decentralized Markov decision processes with event-driven interactions , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[4]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[5]  Edmund H. Durfee,et al.  Partial global planning: a coordination framework for distributed hypothesis formation , 1991, IEEE Trans. Syst. Man Cybern..

[6]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[7]  Victor Lesser,et al.  Environment Centered Analysis and Design of Coordination Mechanisms , 1996 .

[8]  Claudia V. Goldman,et al.  Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..

[9]  Milind Tambe,et al.  Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..

[10]  Marek Petrik,et al.  Anytime Coordination Using Separable Bilinear Programs , 2007, AAAI.

[11]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[12]  Reid G. Smith,et al.  The Contract Net Protocol: High-Level Communication and Control in a Distributed Problem Solver , 1980, IEEE Transactions on Computers.

[13]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[14]  Edmund H. Durfee,et al.  Commitment-driven distributed joint policy search , 2007, AAMAS '07.

[15]  Abdel-Illah Mouaddib,et al.  A polynomial algorithm for decentralized Markov decision processes with temporal constraints , 2005, AAMAS '05.

[16]  Milind Tambe,et al.  On opportunistic techniques for solving decentralized Markov decision processes with temporal constraints , 2007, AAMAS '07.

[17]  Shlomo Zilberstein,et al.  Optimizing Memory-Bounded Controllers for Decentralized POMDPs , 2007, UAI.

[18]  Luke Hunsberger,et al.  Algorithms for a temporal decoupling problem in multi-agent planning , 2002, AAAI/IAAI.

[19]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[20]  Hector J. Levesque,et al.  Intention is Choice with Commitment , 1990, Artif. Intell..

[21]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[22]  Edmund H. Durfee,et al.  Commitment-Based Service Coordination , 2008, SOCASE.