An Investigation into Mathematical Programming for Finite Horizon Decentralized POMDPs

Decentralized planning in uncertain environments is a complex task generally dealt with by using a decision-theoretic approach, mainly through the framework of Decentralized Partially Observable Markov Decision Processes (DEC-POMDPs). Although DEC-POMDPS are a general and powerful modeling tool, solving them is a task with an overwhelming complexity that can be doubly exponential. In this paper, we study an alternate formulation of DEC-POMDPs relying on a sequence-form representation of policies. From this formulation, we show how to derive Mixed Integer Linear Programming (MILP) problems that, once solved, give exact optimal solutions to the DEC-POMDPs. We show that these MILPs can be derived either by using some combinatorial characteristics of the optimal solutions of the DEC-POMDPs or by using concepts borrowed from game theory. Through an experimental validation on classical test problems from the DEC-POMDP literature, we compare our approach to existing algorithms. Results show that mathematical programming outperforms dynamic programming but is less efficient than forward search, except for some particular problems. The main contributions of this work are the use of mathematical programming for DEC-POMDPs and a better understanding of DEC-POMDPs and of their solutions. Besides, we argue that our alternate representation of DEC-POMDPs could be helpful for designing novel algorithms looking for approximate solutions to DEC-POMDPs.

[1]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[2]  François Charpillet,et al.  Cooperative co-learning: a model-based approach for solving multi-agent reinforcement problems , 2002, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings..

[3]  S. Hart,et al.  HANDBOOK OF GAME THEORY , 2011 .

[4]  Sridhar Mahadevan,et al.  Learning to communicate and act using hierarchical reinforcement learning , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[5]  Urmila M. Diwekar,et al.  Introduction to Applied Optimization , 2020, Springer Optimization and Its Applications.

[6]  R. Horst,et al.  Global Optimization: Deterministic Approaches , 1992 .

[7]  Vincent Conitzer,et al.  Mixed-Integer Programming Methods for Finding Nash Equilibria , 2005, AAAI.

[8]  Shlomo Zilberstein,et al.  Optimizing Memory-Bounded Controllers for Decentralized POMDPs , 2007, UAI.

[9]  Marek Petrik,et al.  A Bilinear Programming Approach for Multiagent Planning , 2009, J. Artif. Intell. Res..

[10]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[11]  R. Fletcher Practical Methods of Optimization , 1988 .

[12]  Robert J. Vanderbei,et al.  Linear Programming: Foundations and Extensions , 1998, Kluwer international series in operations research and management service.

[13]  Olivier Buffet,et al.  Shaping multi-agent systems with gradient reinforcement learning , 2007, Autonomous Agents and Multi-Agent Systems.

[14]  Michael R. James,et al.  Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.

[15]  Marek Petrik,et al.  Average-Reward Decentralized Markov Decision Processes , 2007, IJCAI.

[16]  Shimon Whiteson,et al.  Lossless clustering of histories in decentralized POMDPs , 2009, AAMAS.

[17]  G. Dantzig ON THE SIGNIFICANCE OF SOLVING LINEAR PROGRAMMING PROBLEMS WITH SOME INTEGER VARIABLES , 1960 .

[18]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[19]  D. Koller,et al.  Finding mixed strategies with small supports in extensive form games , 1996 .

[20]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[21]  Victor R. Lesser,et al.  Communication in multi-agent Markov decision processes , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[22]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[23]  Shlomo Zilberstein,et al.  Solving POMDPs using quadratically constrained linear programs , 2006, AAMAS '06.

[24]  François Charpillet,et al.  Point-based Dynamic Programming for DEC-POMDPs , 2006, AAAI.

[25]  B. O. Anderson,et al.  Time-varying feedback laws for decentralized control , 1980, 1980 19th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[26]  François Charpillet,et al.  A heuristic approach for solving decentralized-POMDP: assessment on the pursuit problem , 2002, SAC '02.

[27]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[28]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[29]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[30]  T. Ferguson Game Theory and Decision Theory , 1967 .

[31]  Michael Wooldridge,et al.  Game Theory and Decision Theory in Multi-Agent Systems , 2002, Autonomous Agents and Multi-Agent Systems.

[32]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[33]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[34]  Robert Wilson,et al.  A global Newton method to compute Nash equilibria , 2003, J. Econ. Theory.

[35]  P. Pardalos,et al.  Handbook of global optimization , 1995 .

[36]  Gerhard Weiss,et al.  Multiagent Systems , 1999 .

[37]  R. F. Drenick,et al.  Multilinear programming: Duality theories , 1992 .

[38]  Vincent Chevrier,et al.  Interac-DEC-MDP: towards the use of interactions in DEC-MDP , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[39]  S. Zilberstein,et al.  Bounded Dynamic Programming for Decentralized POMDPs , 2007 .

[40]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[41]  Abdellah Salhi,et al.  Global Optimization: Deterministic Approaches (2nd Edition) , 1994 .

[42]  B. Stengel,et al.  COMPUTING EQUILIBRIA FOR TWO-PERSON GAMES , 1996 .

[43]  Sridhar Mahadevan,et al.  Learning to Communicate and Act in Cooperative Multiagent Systems using Hierarchical Reinforcement Learning , 2004 .

[44]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[45]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[46]  Gérard Cornuéjols,et al.  Valid inequalities for mixed integer linear programs , 2007, Math. Program..

[47]  Bernhard von Stengel,et al.  Fast algorithms for finding randomized strategies in game trees , 1994, STOC '94.

[48]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[49]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[50]  F. d'Epenoux,et al.  A Probabilistic Production and Inventory Problem , 1963 .

[51]  Roy Radner The Application of Linear Programming to Team Decision Problems , 1959 .

[52]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[53]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[54]  Brahim Chaib-draa,et al.  Exact Dynamic Programming for Decentralized POMDPs with Lossless Policy Compression , 2008, ICAPS.

[55]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[56]  C. E. Lemke,et al.  Bimatrix Equilibrium Points and Mathematical Programming , 1965 .

[57]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[58]  Michael H. Bowling,et al.  Online Discovery and Learning of Predictive State Representations , 2005, NIPS.

[59]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[60]  Marek Petrik,et al.  Anytime Coordination Using Separable Bilinear Programs , 2007, AAAI.

[61]  Shlomo Zilberstein,et al.  Incremental Policy Generation for Finite-Horizon DEC-POMDPs , 2009, ICAPS.

[62]  Bernhard von Stengel,et al.  Chapter 45 Computing equilibria for two-person games , 2002 .

[63]  Jianhui Wu,et al.  Mixed-integer linear programming for transition-independent decentralized MDPs , 2006, AAMAS '06.