Teamwork in distributed POMDPs: execution-time coordination under model uncertainty

Despite their NEXP-complete policy generation complexity [1],Distributed Partially Observable Markov Decision Problems(DEC-POMDPs) have become a popular paradigm for multiagentteamwork [2, 6, 8]. DEC-POMDPs are able to quantitatively ex-press observational and action uncertainty, and yet optimally plancommunications and domain actions.This paper focuses on teamwork under model uncertainty (i.e.,potentially inaccurate transition and observation functions) inDEC-POMDPs. In many domains, we only have an approximatemodel of agent observation or transition functions. To address thischallenge we rely on execution-centric frameworks [7, 11, 12],which simplify planning in DEC-POMDPs (e.g., by assuming cost-free communication at plan-time), and shift coordination reasoningto execution time. Specifically, during planning, these frameworkshave a standard single-agent POMDP planner [4] to plan a pol-icy for the team of agents by assuming zero-cost communication.Then, at execution-time, agents model other agentsa˛´r beliefs andactions, reason about when to communicate with teammates, rea-son about what action to take if not communicating, etc. Unfortu-nately, past work in execution-centric approaches [7, 11, 12] alsoassumes a correct world model, and the presence of model uncer-tainty exposes key weaknesses that result in erroneous plans andadditional inefficiency due to reasoning over incorrect world mod-els at every decision epoch.This paper provides two sets of contributions. The first is anew execution-centric framework for DEC-POMDPs called MOD-ERN (MOdel uncertainty in Dec-pomdp Execution-time ReasoN-ing). MODERN is the first execution-centric framework for DEC-POMDPs explicitly motivated by model uncertainty. It is based onCiteas: