Agent Modelling in Partially Observable Domains

Monitoring selectivity is a key challenge faced by agents when modelling other agents(1) — agents cannot continually monitor others due to the computational burden of such monitoring and modelling, but lack of such monitoring and modelling leads to increased uncertainty about the state of other agents. Such monitoring selectivity is also crucially important when agents engage in planning in the presence of action and observation uncertainty. Formally, this paper focuses on an agent that uses a POMDP to plan its activities, in a multiagent setting, and illustrates the critical nature of the monitoring selectivity challenge in POMDPs. The paper presents heuristics to limit the amount of monitoring and modelling of other agents, where the heuristics exploit the reward structure and transition probabilities to automatically determine where to curtail such monitoring and modelling. We concretely illustrate our techniques in the domain of software personal assistants, and present some initial experimental results illustrating the efficiency