One important challenge for a set of agents to achieve more efficient collaboration is for these agents to maintain proper models of each other. An important aspect of these models of other agents is that they are often partial and incomplete. Thus far, there are two common representations of agent models: MDP based and action based, which are both based on action modeling. In many applications, agent models may not have been given, and hence must be learnt. While it may seem convenient to use either MDP based or action based models for learning, in this paper, we introduce a new representation based on capability models, which has several unique advantages. First, we show that learning capability models can be performed efficiently online via Bayesian learning, and the learning process is robust to high degrees of incompleteness in plan execution traces (e.g., with only start and end states). While high degrees of incompleteness in plan execution traces presents learning challenges for MDP based and action based models, capability models can still learn to {\em abstract} useful information out of these traces. As a result, capability models are useful in applications in which such incompleteness is common, e.g., robot learning human model from observations and interactions. Furthermore, when used in multi-agent planning (with each agent modeled separately), capability models provide flexible abstraction of actions. The limitation, however, is that the synthesized plan is incomplete and abstract.
[1]
Subbarao Kambhampati,et al.
Action-Model Acquisition from Noisy Plan Traces
,
2013,
IJCAI.
[2]
Shlomo Zilberstein,et al.
Memory-Bounded Dynamic Programming for DEC-POMDPs
,
2007,
IJCAI.
[3]
Martin L. Puterman,et al.
Markov Decision Processes: Discrete Stochastic Dynamic Programming
,
1994
.
[4]
Ronen I. Brafman,et al.
From One to Many: Planning for Loosely Coupled Multi-Agent Systems
,
2008,
ICAPS.
[5]
Guy Shani,et al.
Model-Based Online Learning of POMDPs
,
2005,
ECML.
[6]
Michael L. Littman,et al.
Online Linear Regression and Its Application to Model-Based Reinforcement Learning
,
2007,
NIPS.
[7]
Andrew W. Moore,et al.
Reinforcement Learning: A Survey
,
1996,
J. Artif. Intell. Res..
[8]
James A. Hendler,et al.
HTN Planning: Complexity and Expressivity
,
1994,
AAAI.
[9]
E. Soloway,et al.
Causal Model Progressions as a Foundation for Intelligent Learning Environments
,
1990
.
[10]
Maurice Pagnucco,et al.
A Framework for Task Planning in Heterogeneous Multi Robot Systems Based on Robot Capabilities
,
2014,
AAAI.
[11]
Richard E. Neapolitan,et al.
Learning Bayesian networks
,
2007,
KDD '07.
[12]
Maria Fox,et al.
PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains
,
2003,
J. Artif. Intell. Res..
[13]
Prashant Doshi,et al.
Interactive POMDPs: properties and preliminary results
,
2004,
Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..