Implementation Techniques for Solving POMDPs in Personal Assistant Agents

Agents or agent teams deployed to assist humans often face the challenges of monitoring the state of key processes in their environment (including the state of their human users themselves) and making periodic decisions based on such monitoring. POMDPs appear well suited to enable agents to address these challenges, given the uncertain environment and cost of actions, but optimal policy generation for POMDPs is computationally expensive. This paper introduces two key implementation techniques (one exact and one approximate) to speedup POMDP policy generation that exploit the notion of progress or dynamics in personal assistant domains and the density of policy vectors. Policy computation is restricted to the belief space polytope that remains reachable given the progress structure of a domain. One is based on applying Lagrangian methods to compute a bounded belief space support in polynomial time and other based on approximating policy vectors in the bounded belief polytope. We illustrate this by enhancing two of the fastest existing algorithms for exact POMDP policy generation. The order of magnitude speedups demonstrate the utility of our implementation techniques in facilitating the deployment of POMDPs within agents assisting human users.

[1]  Hector Muñoz-Avila,et al.  SHOP: Simple Hierarchical Ordered Planner , 1999, IJCAI.

[2]  Marco Pistore,et al.  Weak, strong, and strong cyclic planning via symbolic model checking , 2003, Artif. Intell..

[3]  Manuela M. Veloso,et al.  Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork , 1999, Artif. Intell..

[4]  Massimo Lucchesi Coaching the 3-4-1-2 and 4-2-3-1 , 2002 .

[5]  Shlomo Zilberstein,et al.  Region-Based Incremental Pruning for POMDPs , 2004, UAI.

[6]  Oliver Obst,et al.  Spark - A generic simulator for physical multi-agent simulations , 2004, Comput. Syst. Sci. Eng..

[7]  Martha E. Pollack,et al.  Autominder: an intelligent cognitive orthotic system for people with memory impairment , 2003, Robotics Auton. Syst..

[8]  Joachim Hertzberg,et al.  Learning to optimize mobile robot navigation based on HTN plans , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[9]  Barbara J. Grosz AAAI-94 Presidential Address: Collaborative Systems , 1996, AI Mag..

[10]  Jeremy V. Pitt,et al.  A Protocol-Based Semantics for an Agent Communication Language , 1999, IJCAI.

[11]  Milos Hauskrecht,et al.  Planning treatment of ischemic heart disease with partially observable Markov decision processes , 2000, Artif. Intell. Medicine.

[12]  Riccardo Bellazzi,et al.  Using uncertainty management techniques in medical therapy planning: A decision-theoretic approach , 1998, Applications of Uncertainty Formalisms.

[13]  Manuela M. Veloso,et al.  Planning for Distributed Execution through Use of Probabilistic Opponent Models , 2002, AIPS.

[14]  Adam Jacoff,et al.  RoboCup 2004 Competitions and Symposium: A Small Kick for Robots, a Giant Score for Science , 2005, AI Mag..

[15]  Eric A. Hansen,et al.  An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.

[16]  Nenad Medvidovic,et al.  Towards a taxonomy of software connectors , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[17]  Koen V. Hindriks,et al.  Agent Programming in 3APL , 1999, Autonomous Agents and Multi-Agent Systems.

[18]  David Kortenkamp,et al.  Supporting group interaction among humans and autonomous agents , 2002, Connect. Sci..

[19]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[20]  Akihiko Ohsuga,et al.  A New HTN Planning Framework for Agents in Dynamic Environments , 2004, CLIMA.

[21]  Munindar P. Singh A Social Semantics for Agent Communication Languages , 2000, Issues in Agent Communication.

[22]  Weihong Zhang,et al.  Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[23]  Milind Tambe,et al.  Towards Adjustable Autonomy for the Real World , 2002, J. Artif. Intell. Res..

[24]  Hector J. Levesque,et al.  Communicative Actions for Artificial Agents , 1997, ICMAS.

[25]  D. Walton,et al.  Commitment In Dialogue , 1995 .

[26]  Paolo Traverso,et al.  Automated planning - theory and practice , 2004 .

[27]  Hector Muñoz-Avila,et al.  IMPACTing SHOP: Planning in a Multi-agent Environment , 2000, CL-2000 Workshop on Computational Logic in Multi-Agent Systems.

[28]  Nicholas Roy,et al.  Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[29]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[30]  Simon Parsons,et al.  Modelling dialogues using argumentation , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[31]  Frank Dignum,et al.  Enacting and Deacting Roles in Agent Programming , 2004, AOSE.

[32]  Brett Browning,et al.  Plays as Team Plans for Coordination and Adaptation , 2003, RoboCup.

[33]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[34]  John-Jules Ch. Meyer,et al.  A Dialogue Game to Offer an Agreement to Disagree , 2004, PROMAS.

[35]  Cungen Cao,et al.  Modelling Medical Decisions in DynaMoL: A New General Framework of Dynamic Decision Analysis , 1998, MedInfo.

[36]  Hector Muñoz-Avila,et al.  Strategic Planning for Unreal Tournament © Bots , 2004 .