Probabilistic Inference Techniques for Scalable Multiagent Decision Making

Decentralized POMDPs provide an expressive framework for multiagent sequential decision making. However, the complexity of these models--NEXP-Complete even for two agents--has limited their scalability. We present a promising new class of approximation algorithms by developing novel connections between multiagent planning and machine learning. We show how the multiagent planning problem can be reformulated as inference in a mixture of dynamic Bayesian networks (DBNs). This planning-as-inference approach paves the way for the application of efficient inference techniques in DBNs to multiagent decision making. To further improve scalability, we identify certain conditions that are sufficient to extend the approach to multiagent systems with dozens of agents. Specifically, we show that the necessary inference within the expectation-maximization framework can be decomposed into processes that often involve a small subset of agents, thereby facilitating scalability. We further show that a number of existing multiagent planning models satisfy these conditions. Experiments on large planning benchmarks confirm the benefits of our approach in terms of runtime and scalability with respect to existing techniques.

[1]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[2]  S. Zilberstein,et al.  Event-detecting multi-agent MDPs: complexity and constant-factor approximation , 2009, IJCAI 2009.

[3]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[4]  Jaakko Peltonen,et al.  Efficient Planning for Factored Infinite-Horizon DEC-POMDPs , 2011, IJCAI.

[5]  Shlomo Zilberstein,et al.  Constraint-based dynamic programming for decentralized POMDPs with structured interactions , 2009, AAMAS.

[6]  Makoto Yokoo,et al.  Not all agents are equal: scaling up distributed POMDPs for agent networks , 2008, AAMAS.

[7]  Claudia V. Goldman,et al.  Transition-independent decentralized markov decision processes , 2003, AAMAS '03.

[8]  Shlomo Zilberstein,et al.  Anytime Planning for Decentralized POMDPs using Expectation Maximization , 2010, UAI.

[9]  Daphne Koller,et al.  Computing Factored Value Functions for Policies in Structured MDPs , 1999, IJCAI.

[10]  Qiang Liu,et al.  Variational algorithms for marginal MAP , 2011, J. Mach. Learn. Res..

[11]  Victor R. Lesser,et al.  Offline Planning for Communication by Exploiting Structured Interactions in Decentralized MDPs , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[12]  Shlomo Zilberstein,et al.  Solving POMDPs using quadratically constrained linear programs , 2006, AAMAS '06.

[13]  Steffen L. Lauritzen,et al.  Representing and Solving Decision Problems with Limited Information , 2001, Manag. Sci..

[14]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[15]  Shimon Whiteson,et al.  Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs , 2013, J. Artif. Intell. Res..

[16]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[17]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[18]  Olivier Buffet,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Optimally Solving Dec-POMDPs as Continuous-State MDPs , 2022 .

[19]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[20]  Shlomo Zilberstein,et al.  Point-based backup for decentralized POMDPs: complexity and new algorithms , 2010, AAMAS.

[21]  Brahim Chaib-draa,et al.  Point-based incremental pruning heuristic for solving finite-horizon DEC-POMDPs , 2009, AAMAS.

[22]  Makoto Yokoo,et al.  Letting loose a SPIDER on a network of POMDPs: generating quality guaranteed policies , 2007, AAMAS '07.

[23]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[24]  Ari Hottinen,et al.  Optimizing Spatial and Temporal Reuse in Wireless Networks by Decentralized Partially Observable Markov Decision Processes , 2014, IEEE Transactions on Mobile Computing.

[25]  Michael A. Saunders,et al.  SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization , 2002, SIAM J. Optim..

[26]  Qiang Liu,et al.  Variational Planning for Graph-based MDPs , 2013, NIPS.

[27]  Marc Toussaint,et al.  Scalable Multiagent Planning Using Probabilistic Inference , 2011, IJCAI.

[28]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[29]  Feng Wu,et al.  Monte-Carlo Expectation Maximization for Decentralized POMDPs , 2013, IJCAI.

[30]  Marc Toussaint,et al.  Hierarchical POMDP Controller Optimization by Likelihood Maximization , 2008, UAI.

[31]  Jesse Hoey,et al.  Energy Efficient Execution of POMDP Policies , 2015, IEEE Transactions on Cybernetics.

[32]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[33]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[34]  Marc Toussaint,et al.  Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains , 2011, ECML/PKDD.

[35]  Nando de Freitas,et al.  New inference strategies for solving Markov Decision Processes using reversible jump MCMC , 2009, UAI.

[36]  Jesse Hoey,et al.  Controller Compilation and Compression for Resource Constrained Applications , 2013, ADT.

[37]  P. Poupart,et al.  POMDP Planning by Marginal-MAP Probabilistic Inference in Generative Models , 2014 .

[38]  François Charpillet,et al.  Producing efficient error-bounded solutions for transition independent decentralized mdps , 2013, AAMAS.

[39]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[40]  Marc Toussaint,et al.  Probabilistic inference for solving (PO) MDPs , 2006 .

[41]  Shimon Whiteson,et al.  Approximate solutions for factored Dec-POMDPs with many agents , 2013, AAMAS.

[42]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[43]  Feng Wu,et al.  Trial-Based Dynamic Programming for Multi-Agent Planning , 2010, AAAI.

[44]  H. Levent Akin,et al.  Solving decentralized POMDP problems using genetic algorithms , 2012, Autonomous Agents and Multi-Agent Systems.

[45]  Edmund H. Durfee,et al.  Abstracting Influences for Efficient Multiagent Coordination Under Uncertainty , 2011 .

[46]  Jaakko Peltonen,et al.  Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning , 2011, NIPS.

[47]  Victor R. Lesser,et al.  Compact Mathematical Programs For DEC-MDPs With Structured Agent Interactions , 2011, UAI.

[48]  Victor R. Lesser,et al.  Decentralized Markov decision processes with event-driven interactions , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[49]  Pascal Poupart,et al.  Policy optimization by marginal-map probabilistic inference in generative models , 2014, AAMAS.

[50]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[51]  Edmund H. Durfee,et al.  Towards a unifying characterization for quantifying weak coupling in dec-POMDPs , 2011, AAMAS.

[52]  Edmund H. Durfee,et al.  Influence-Based Policy Abstraction for Weakly-Coupled Dec-POMDPs , 2010, ICAPS.

[53]  AmatoChristopher,et al.  Optimally solving Dec-POMDPs as continuous-state MDPs , 2016 .

[54]  Eric Allender,et al.  Complexity of finite-horizon Markov decision process problems , 2000, JACM.

[55]  Olivier Buffet,et al.  Error-Bounded Approximations for Infinite-Horizon Discounted Decentralized POMDPs , 2014, ECML/PKDD.

[56]  Claudia V. Goldman,et al.  Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..

[57]  Qiang Liu,et al.  Belief Propagation for Structured Decision Making , 2012, UAI.

[58]  Milind Tambe,et al.  Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping , 2009, ICAPS.

[59]  Jaakko Peltonen,et al.  Expectation Maximization for Average Reward Decentralized POMDPs , 2013, ECML/PKDD.

[60]  Charles L. Isbell,et al.  Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs , 2013, NIPS.

[61]  Shlomo Zilberstein,et al.  Policy Iteration for Decentralized Control of Markov Decision Processes , 2009, J. Artif. Intell. Res..

[62]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[63]  Nando de Freitas,et al.  An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward , 2009, AISTATS.

[64]  Shlomo Zilberstein,et al.  Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs , 2010, Autonomous Agents and Multi-Agent Systems.

[65]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .