A Concise Introduction to Decentralized POMDPs

This book introduces multiagent planning under uncertainty as formalized by decentralized partially observable Markov decision processes (Dec-POMDPs). The intended audience is researchers and graduate students working in the fields of artificial intelligence related to sequential decision making: reinforcement learning, decision-theoretic planning for single agents, classical multiagent planning, decentralized control, and operations research.

[1]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[2]  J. Marschak,et al.  Elements for a Theory of Teams , 1955 .

[3]  R. Radner,et al.  Team Decision Problems , 1962 .

[4]  H. Witsenhausen Separation of estimation and control for discrete time systems , 1971 .

[5]  R. Radner,et al.  Economic theory of teams , 1972 .

[6]  J. Walrand,et al.  On delayed sharing patterns , 1978 .

[7]  S. Marcus,et al.  Decentralized control of finite state Markov processes , 1980, 1980 19th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[8]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[9]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[10]  Munindar P. Singh Multiagent Systems - A Theoretical Framework for Intentions, Know-How, and Communications , 1994, Lecture Notes in Computer Science.

[11]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[12]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[13]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[14]  Katia P. Sycara,et al.  Exploiting Problem Structure for Distributed Constraint Optimization , 1995, ICMAS.

[15]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[16]  Nicholas R. Jennings,et al.  Controlling Cooperative Problem Solving in Industrial Multi-Agent Systems Using Joint Intentions , 1995, Artif. Intell..

[17]  Anand S. Rao,et al.  BDI Agents: From Theory to Practice , 1995, ICMAS.

[18]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[19]  Nicholas R. Jennings,et al.  Intelligent agents: theory and practice , 1995, The Knowledge Engineering Review.

[20]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[21]  G. W. Wornell,et al.  Decentralized control of a multiple access broadcast channel: performance bounds , 1996, Proceedings of 35th IEEE Conference on Decision and Control.

[22]  Michael P. Georgeff,et al.  Modelling and Design of Multi-Agent Systems , 1997, ATAL.

[23]  M. Yokoo,et al.  Distributed Breakout Algorithm for Solving Distributed Constraint Satisfaction Problems , 1996 .

[24]  Milind Tambe,et al.  Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..

[25]  Avi Pfeffer,et al.  Representations and Solutions for Game-Theoretic Problems , 1997, Artif. Intell..

[26]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[27]  Nicholas R. Jennings,et al.  Agent-Based Computing: Promise and Perils , 1999, IJCAI.

[28]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[29]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[30]  Kee-Eung Kim,et al.  Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[31]  Hiroaki Kitano,et al.  RoboCup Rescue: search and rescue in large-scale disasters as a domain for autonomous agents research , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[32]  Victor R. Lesser,et al.  Cooperative Multiagent Systems: A Personal View of the State of the Art , 1999, IEEE Trans. Knowl. Data Eng..

[33]  Manuela M. Veloso,et al.  Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork , 1999, Artif. Intell..

[34]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[35]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[36]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[37]  Marco Wiering,et al.  Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .

[38]  Jesse Hoey,et al.  APRICODD: Approximate Policy Construction Using Decision Diagrams , 2000, NIPS.

[39]  Michael L. Littman,et al.  Graphical Models for Game Theory , 2001, UAI.

[40]  Lex Weaver,et al.  A Multi-Agent Policy-Gradient Approach to Network Routing , 2001, ICML.

[41]  Julie A. Adams,et al.  Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence , 2001, AI Mag..

[42]  Victor R. Lesser,et al.  Communication decisions in multi-agent cooperation: model and experiments , 2001, AGENTS '01.

[43]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[44]  相場亮 Distributed Constraint Satisfaction: Foundations of Cooperation in Multi - Agent Systems , 2001 .

[45]  Milind Tambe,et al.  Team Formation for Reformation in Multiagent Domains Like RoboCupRescue , 2002, RoboCup.

[46]  Leslie Pack Kaelbling,et al.  Reinforcement Learning by Policy Search , 2002 .

[47]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[48]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[49]  Milind Tambe,et al.  Team Formation for Reformation , 2002 .

[50]  Milind Tambe,et al.  Role allocation and reallocation in multiagent teams: towards a practical analysis , 2003, AAMAS '03.

[51]  Sebastian Thrun,et al.  Planning under Uncertainty for Reliable Health Care Robotics , 2003, FSR.

[52]  Barbara Messing,et al.  An Introduction to MultiAgent Systems , 2002, Künstliche Intell..

[53]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[54]  Claudia V. Goldman,et al.  The complexity of multiagent systems: the price of silence , 2003, AAMAS '03.

[55]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[56]  David V. Pynadath,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[57]  A. Koopman,et al.  Simulation and optimization of traffic in a city , 2004, IEEE Intelligent Vehicles Symposium, 2004.

[58]  Makoto Yokoo,et al.  Communications for improving policy computation in distributed POMDPs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[59]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[60]  Leslie Pack Kaelbling,et al.  Representing hierarchical POMDPs as DBNs for multi-scale robot localization , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[61]  Milind Tambe,et al.  An Automated Teamwork Infrastructure for Heterogeneous Software Agents and Humans , 2003, Autonomous Agents and Multi-Agent Systems.

[62]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[63]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[64]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[65]  Manuela M. Veloso,et al.  Reasoning about joint beliefs for execution-time communication decisions , 2005, AAMAS '05.

[66]  Milind Tambe,et al.  Hybrid BDI-POMDP Framework for Multiagent Teaming , 2011, J. Artif. Intell. Res..

[67]  Alberto RibesAbstract,et al.  Multi agent systems , 2019, Proceedings of the 2005 International Conference on Active Media Technology, 2005. (AMT 2005)..

[68]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[69]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[70]  Manuela Veloso,et al.  Decentralized Communication Strategies for Coordinated Multi-Agent Policies , 2005 .

[71]  Cees Witteveen,et al.  Multi-agent Planning An introduction to planning and coordination , 2005 .

[72]  Joelle Pineau,et al.  POMDP Planning for Robust Robot Control , 2005, ISRR.

[73]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[74]  François Charpillet,et al.  An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs , 2005, ECML.

[75]  P. Poupart Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[76]  Nikos A. Vlassis,et al.  Using the Max-Plus Algorithm for Multiagent Decision Making in Coordination Graphs , 2005, BNAIC.

[77]  Brahim Chaib-draa,et al.  An online POMDP algorithm for complex multiagent environments , 2005, AAMAS '05.

[78]  Makoto Yokoo,et al.  Adopt: asynchronous distributed constraint optimization with quality guarantees , 2005, Artif. Intell..

[79]  Prashant Doshi,et al.  Exact solutions of interactive POMDPs using behavioral equivalence , 2006, AAMAS '06.

[80]  Victor R. Lesser,et al.  Agent interaction in distributed POMDPs and its implications on complexity , 2006, AAMAS '06.

[81]  Makoto Yokoo,et al.  Exploiting Locality of Interaction in Networked Distributed POMDPs , 2006, AAAI Spring Symposium: Distributed Plan and Schedule Management.

[82]  Nikos A. Vlassis,et al.  Decentralized planning under uncertainty for teams of communicating agents , 2006, AAMAS '06.

[83]  Frans A. Oliehoek,et al.  A hierarchical model for decentralized fighting of large scale urban fires , 2006 .

[84]  Nikos A. Vlassis,et al.  Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..

[85]  François Charpillet,et al.  Point-based Dynamic Programming for DEC-POMDPs , 2006, AAAI.

[86]  Agostino Poggi,et al.  Multiagent Systems , 2006, Intelligenza Artificiale.

[87]  Ben J. A. Kröse,et al.  Dynamic Bayesian Networks for Visual Surveillance with Distributed Cameras , 2006, EuroSSC.

[88]  Marc Toussaint,et al.  Probabilistic inference for solving (PO) MDPs , 2006 .

[89]  Makoto Yokoo,et al.  Winning back the CUP for distributed POMDPs: planning over continuous belief spaces , 2006, AAMAS '06.

[90]  Shlomo Zilberstein,et al.  Formal models and algorithms for decentralized decision making under uncertainty , 2008, Autonomous Agents and Multi-Agent Systems.

[91]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[92]  Frans A. Oliehoek,et al.  Dec-POMDPs with delayed communication , 2007 .

[93]  Marek Petrik,et al.  Average-Reward Decentralized Markov Decision Processes , 2007, IJCAI.

[94]  Trey Smith,et al.  Probabilistic planning for robotic exploration , 2007 .

[95]  Stacy Marsella,et al.  Minimal Mental Models , 2007, AAAI.

[96]  Manuela M. Veloso,et al.  Exploiting factored representations for decentralized execution in multiagent teams , 2007, AAMAS '07.

[97]  Shlomo Zilberstein,et al.  Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs , 2007, UAI.

[98]  Makoto Yokoo,et al.  Letting loose a SPIDER on a network of POMDPs: generating quality guaranteed policies , 2007, AAMAS '07.

[99]  Leslie Pack Kaelbling,et al.  Automated Design of Adaptive Controllers for Modular Robots using Reinforcement Learning , 2008, Int. J. Robotics Res..

[100]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[101]  Nikos A. Vlassis,et al.  Multiagent Planning Under Uncertainty with Stochastic Communication Delays , 2008, ICAPS.

[102]  Yoav Shoham,et al.  Essentials of Game Theory: A Concise Multidisciplinary Introduction , 2008, Essentials of Game Theory: A Concise Multidisciplinary Introduction.

[103]  Makoto Yokoo,et al.  Not all agents are equal: scaling up distributed POMDPs for agent networks , 2008, AAMAS.

[104]  Ronald T. van Katwijk,et al.  Multi-Agent Look-Ahead Traffic-Adaptive Control , 2008 .

[105]  Francisco S. Melo,et al.  Interaction-driven Markov games for decentralized multiagent planning under uncertainty , 2008, AAMAS.

[106]  Nikos A. Vlassis,et al.  The Cross-Entropy Method for Policy Search in Decentralized POMDPs , 2008, Informatica.

[107]  Leslie Pack Kaelbling,et al.  Multi-Agent Filtering with Infinitely Nested Beliefs , 2008, NIPS.

[108]  Shimon Whiteson,et al.  Multiagent Reinforcement Learning for Urban Traffic Control Using Coordination Graphs , 2008, ECML/PKDD.

[109]  Marc Toussaint,et al.  Hierarchical POMDP Controller Optimization by Likelihood Maximization , 2008, UAI.

[110]  Richard R. Brooks,et al.  Distributed Sensor Networks: A Multiagent Perspective , 2008 .

[111]  Shimon Whiteson,et al.  Exploiting locality of interaction in factored Dec-POMDPs , 2008, AAMAS.

[112]  Milind Tambe,et al.  Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping , 2009, ICAPS.

[113]  Victor R. Lesser,et al.  Offline Planning for Communication by Exploiting Structured Interactions in Decentralized MDPs , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[114]  Marek Petrik,et al.  A Bilinear Programming Approach for Multiagent Planning , 2009, J. Artif. Intell. Res..

[115]  Rina Dechter,et al.  AND/OR Branch-and-Bound search for combinatorial optimization in graphical models , 2009, Artif. Intell..

[116]  Shimon Whiteson,et al.  Lossless clustering of histories in decentralized POMDPs , 2009, AAMAS.

[117]  Marc Toussaint,et al.  Probabilistic inference as a model of planned behavior , 2009, Künstliche Intell..

[118]  Mathijs de Weerdt,et al.  Introduction to planning in multiagent systems , 2009, Multiagent Grid Syst..

[119]  Edmund H. Durfee,et al.  Flexible approximation of structured interactions in decentralized Markov decision processes , 2009, AAMAS.

[120]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[121]  Shlomo Zilberstein,et al.  Constraint-based dynamic programming for decentralized POMDPs with structured interactions , 2009, AAMAS.

[122]  Nikos Vlassis,et al.  A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence I Mobk077-fm Synthesis Lectures on Artificial Intelligence and Machine Learning a Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence a Concise Introduction to Multiagent Systems and D , 2007 .

[123]  Victor R. Lesser,et al.  Self-organization for coordinating decentralized reinforcement learning , 2010, AAMAS.

[124]  Frans C. A. Groen,et al.  A Distributed Approach to Gas Detection and Source Localization Using Heterogeneous Information , 2010, Interactive Collaborative Information Systems.

[125]  Michael L. Littman,et al.  Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.

[126]  Kavi Kumar Khedo,et al.  A Wireless Sensor Network Air Pollution Monitoring System , 2010, ArXiv.

[127]  Feng Wu,et al.  Point-based policy generation for decentralized POMDPs , 2010, AAMAS.

[128]  Shlomo Zilberstein,et al.  Anytime Planning for Decentralized POMDPs using Expectation Maximization , 2010, UAI.

[129]  Edmund H. Durfee,et al.  Influence-Based Policy Abstraction for Weakly-Coupled Dec-POMDPs , 2010, ICAPS.

[130]  Karl Tuyls,et al.  Frequency adjusted multi-agent Q-learning , 2010, AAMAS.

[131]  Frans A. Oliehoek,et al.  Value-Based Planning for Teams of Agents in Stochastic Partially Observable Environments , 2010 .

[132]  Pascal Poupart,et al.  Partially Observable Markov Decision Processes , 2010, Encyclopedia of Machine Learning.

[133]  Feng Wu,et al.  Rollout Sampling Policy Iteration for Decentralized POMDPs , 2010, UAI.

[134]  Edmund H. Durfee,et al.  From policies to influences: a framework for nonlocal abstraction in transition-dependent Dec-POMDP agents , 2010, AAMAS.

[135]  Shlomo Zilberstein,et al.  Point-based backup for decentralized POMDPs: complexity and new algorithms , 2010, AAMAS.

[136]  Frans A. Oliehoek,et al.  Heuristic search for identical payoff Bayesian games , 2010, AAMAS.

[137]  Edmund H. Durfee,et al.  Towards a unifying characterization for quantifying weak coupling in dec-POMDPs , 2011, AAMAS.

[138]  Jaakko Peltonen,et al.  Efficient Planning for Factored Infinite-Horizon DEC-POMDPs , 2011, IJCAI.

[139]  Edmund H. Durfee,et al.  Abstracting Influences for Efficient Multiagent Coordination Under Uncertainty , 2011 .

[140]  Marc Toussaint,et al.  Scalable Multiagent Planning Using Probabilistic Inference , 2011, IJCAI.

[141]  V. Lesser,et al.  A Compact Mathematical Formulation For Problems With Structured Agent Interactions , 2011 .

[142]  Pedro U. Lima,et al.  Efficient Offline Communication Policies for Factored Multiagent POMDPs , 2011, NIPS.

[143]  Frans A. Oliehoek,et al.  Scaling Up Optimal Heuristic Search in Dec-POMDPs via Incremental Expansion , 2011, IJCAI.

[144]  Nicholas R. Jennings,et al.  Bounded approximate decentralised coordination via the max-sum algorithm , 2009, Artif. Intell..

[145]  Prasanna Velagapudi,et al.  Distributed model shaping for scaling to decentralized POMDPs with hundreds of agents , 2011, AAMAS.

[146]  Jian Luo,et al.  Utilizing Partial Policies for Identifying Equivalence of Behavioral Models , 2011, AAAI.

[147]  Ashutosh Nayyar,et al.  Optimal Control Strategies in Delayed Sharing Information Structures , 2010, IEEE Transactions on Automatic Control.

[148]  Jaakko Peltonen,et al.  Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning , 2011, NIPS.

[149]  Victor R. Lesser,et al.  Compact Mathematical Programs For DEC-MDPs With Structured Agent Interactions , 2011, UAI.

[150]  Milind Tambe,et al.  Security and Game Theory - Algorithms, Deployed Systems, Lessons Learned , 2011 .

[151]  Gerhard Weiss,et al.  Multiagent Learning: Basics, Challenges, and Prospects , 2012, AI Mag..

[152]  Leslie Pack Kaelbling,et al.  Heuristic search of multiagent influence space , 2012, AAMAS.

[153]  Leslie Pack Kaelbling,et al.  Integrated robot task and motion planning in belief space , 2012 .

[154]  Leslie Pack Kaelbling,et al.  Influence-Based Abstraction for Multiagent Systems , 2012, AAAI.

[155]  Frans A. Oliehoek,et al.  Decentralized POMDPs , 2012, Reinforcement Learning.

[156]  Peter Vrancx,et al.  Reinforcement Learning: State-of-the-Art , 2012 .

[157]  David Barber,et al.  On the Computational Complexity of Stochastic Controller Optimization in POMDPs , 2011, TOCT.

[158]  Shimon Whiteson,et al.  Exploiting Structure in Cooperative Bayesian Games , 2012, UAI.

[159]  Frans A. Oliehoek,et al.  Sufficient Plan-Time Statistics for Decentralized POMDPs , 2013, IJCAI.

[160]  Frans A. Oliehoek,et al.  Incremental clustering and expansion for faster optimal planning in decentralized POMDPs , 2013 .

[161]  Charles L. Isbell,et al.  Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs , 2013, NIPS.

[162]  Ashutosh Nayyar,et al.  Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach , 2012, IEEE Transactions on Automatic Control.

[163]  Feng Wu,et al.  Monte-Carlo Expectation Maximization for Decentralized POMDPs , 2013, IJCAI.

[164]  Leslie Pack Kaelbling,et al.  Integrated task and motion planning in belief space , 2013, Int. J. Robotics Res..

[165]  Hari Balakrishnan,et al.  TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[166]  Shimon Whiteson,et al.  Approximate solutions for factored Dec-POMDPs with many agents , 2013, AAMAS.

[167]  Jaakko Peltonen,et al.  Expectation Maximization for Average Reward Decentralized POMDPs , 2013, ECML/PKDD.

[168]  Ashutosh Nayyar,et al.  The Common-Information Approach to Decentralized Stochastic Control , 2014 .

[169]  Milind Tambe,et al.  Unleashing Dec-MDPs in Security Games: Enabling Effective Defender Teamwork , 2014, ECAI.

[170]  Yi Ouyang,et al.  Balancing through signaling in decentralized routing , 2014, 53rd IEEE Conference on Decision and Control.

[171]  Dec-POMDPs as Non-Observable MDPs , 2014 .

[172]  Frans A. Oliehoek,et al.  Influence-Optimistic Local Values for Multiagent Planning - Extended Version , 2015, ArXiv.

[173]  Alborz Geramifard,et al.  Decentralized control of Partially Observable Markov Decision Processes using belief space macro-actions , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[174]  Frans A. Oliehoek,et al.  Structure in the value function of zero-sum games of incomplete information , 2015 .

[175]  Jonathan P. How,et al.  Decision Making Under Uncertainty: Theory and Application , 2015 .

[176]  Frans A. Oliehoek,et al.  Factored Upper Bounds for Multiagent Planning Problems under Uncertainty with Non-Factored Value Functions , 2015, IJCAI.

[177]  Shimon Whiteson,et al.  Exploiting Submodular Value Functions for Faster Dynamic Sensor Selection , 2015, AAAI.

[178]  Aditya Mahajan,et al.  Decentralized stochastic control , 2013, Annals of Operations Research.