Towards efficient planning for real world partially observable domains

My research goal is to build large-scale intelligent systems (both single- and multi-agent) that reason with uncertainty in complex, real-world environments. I foresee an integration of such systems in many critical facets of human life ranging from intelligent assistants in hospitals to offices, from rescue agents in large scale disaster response to sensor agents tracking weather phenomena in earth observing sensor webs, and others. In my thesis, I have taken steps towards achieving this goal in the context of systems that operate in partially observable domains that also have transitional (non-deterministic outcomes to actions) uncertainty. Given this uncertainty, Partially Observable Markov Decision Problems (POMDPs) and Distributed POMDPs present themselves as natural choices for modeling these domains. Unfortunately, the significant computational complexity involved in solving POMDPs (PSPACE-Complete) and Distributed POMDPs (NEXP-Complete) is a key obstacle. Due to this significant computational complexity, existing approaches that provide exact solutions do not scale, while approximate solutions do not provide any usable guarantees on quality. My thesis addresses these issues using the following key ideas: The first is exploiting structure in the domain. Utilizing the structure present in the dynamics of the domain or the interactions between the agents allows improved efficiency without sacrificing on the quality of the solution. The second is direct approximation in the value space. This allows for calculated approximations at each step of the algorithm, which in turn allows us to provide usable quality guarantees; such quality guarantees may be specified in advance. In contrast, the existing approaches approximate in the belief space leading to an approximation in the value space (indirect approximation in value space), thus making it difficult to compute functional bounds on approximations. In fact, these key ideas allow for the efficient computation of optimal and quality bounded solutions to complex, large-scale problems, that were not in the purview of existing algorithms.

[1]  Craig Boutilier,et al.  Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[2]  Makoto Yokoo,et al.  Exploiting Locality of Interaction in Networked Distributed POMDPs , 2006, AAAI Spring Symposium: Distributed Plan and Schedule Management.

[3]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[4]  Milind Tambe,et al.  Conflicts in teamwork: hybrids to the rescue , 2005, AAMAS '05.

[5]  Milind Tambe,et al.  Distributed Sensor Networks: A Multiagent Perspective , 2003 .

[6]  Michal Pechoucek,et al.  Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems , 2005, AAMAS 2005.

[7]  Milind Tambe,et al.  Taking DCOP to the real world: efficient complete solutions for distributed multi-event scheduling , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[8]  Kee-Eung Kim,et al.  Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[9]  Cungen Cao,et al.  Modelling Medical Decisions in DynaMoL: A New General Framework of Dynamic Decision Analysis , 1998, MedInfo.

[10]  Makoto Yokoo,et al.  Letting loose a SPIDER on a network of POMDPs: generating quality guaranteed policies , 2007, AAMAS '07.

[11]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[12]  Milind Tambe,et al.  Role allocation and reallocation in multiagent teams: towards a practical analysis , 2003, AAMAS '03.

[13]  Craig Boutilier,et al.  VDCBPI: an Approximate Scalable Algorithm for Large POMDPs , 2004, NIPS.

[14]  Edmund H. Durfee,et al.  Graphical models in local, asymmetric multi-agent Markov decision processes , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[15]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[16]  C. Guestrin,et al.  Distributed localization of networked cameras , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[17]  Milind Tambe,et al.  Towards Adjustable Autonomy for the Real World , 2002, J. Artif. Intell. Res..

[18]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[19]  Roman Barták,et al.  Constraint Processing , 2009, Encyclopedia of Artificial Intelligence.

[20]  Milind Tambe,et al.  Asimovian Multiagents: Applying Laws of Robotics to Teams of Humans and Agents , 2006, PROMAS.

[21]  Milind Tambe,et al.  Exploiting belief bounds: practical POMDPs for personal assistant agents , 2005, AAMAS '05.

[22]  David Kortenkamp,et al.  Supporting group interaction among humans and autonomous agents , 2002, Connect. Sci..

[23]  Victor R. Lesser,et al.  Using cooperative mediation to solve distributed constraint satisfaction problems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[24]  Shobha Venkataraman,et al.  Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.

[25]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[26]  P. J. Gmytrasiewicz,et al.  A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[27]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[28]  Victor R. Lesser,et al.  Solving distributed constraint optimization problems using cooperative mediation , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[29]  Jeff G. Schneider,et al.  Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[30]  Craig Boutilier,et al.  Stochastic Local Search for POMDP Controllers , 2004, AAAI.

[31]  Shlomo Zilberstein,et al.  Efficient Maximization in Solving POMDPs , 2005, AAAI.

[32]  Riccardo Bellazzi,et al.  Using uncertainty management techniques in medical therapy planning: A decision-theoretic approach , 1998, Applications of Uncertainty Formalisms.

[33]  Milind Tambe,et al.  Electric Elves: What Went Wrong and Why , 2006, AAAI Spring Symposium: What Went Wrong and Why: Lessons from AI Research and Applications.

[34]  Martha E. Pollack,et al.  Autominder: an intelligent cognitive orthotic system for people with memory impairment , 2003, Robotics Auton. Syst..

[35]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[36]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[37]  Claudia V. Goldman,et al.  Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..

[38]  Zhengzhu Feng,et al.  An Approach to State Aggregation for POMDPs , 2004 .

[39]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[40]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[41]  Manuela M. Veloso,et al.  Reasoning about joint beliefs for execution-time communication decisions , 2005, AAMAS '05.

[42]  Makoto Yokoo,et al.  An asynchronous complete method for distributed constraint optimization , 2003, AAMAS '03.

[43]  Brahim Chaib-draa,et al.  An online POMDP algorithm for complex multiagent environments , 2005, AAMAS '05.

[44]  M. Yokoo,et al.  Distributed Breakout Algorithm for Solving Distributed Constraint Satisfaction Problems , 1996 .

[45]  Eric A. Hansen,et al.  An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.

[46]  K. Chintalapudi,et al.  Structural Damage Detection Using Wireless Sensor-Actuator Networks , 2005, Proceedings of the 2005 IEEE International Symposium on, Mediterrean Conference on Control and Automation Intelligent Control, 2005..

[47]  Milos Hauskrecht,et al.  Planning treatment of ischemic heart disease with partially observable Markov decision processes , 2000, Artif. Intell. Medicine.

[48]  Makoto Yokoo,et al.  Winning back the CUP for distributed POMDPs: planning over continuous belief spaces , 2006, AAMAS '06.

[49]  Sarit Kraus,et al.  Security in multiagent systems by policy randomization , 2006, AAMAS '06.

[50]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[51]  Nicholas Roy,et al.  Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[52]  Milind Tambe,et al.  Adjustable Autonomy Challenges in Personal Assistant Agents: A Position Paper , 2003, Agents and Computational Autonomy.

[53]  Milind Tambe,et al.  Privacy Loss in Distributed Constraint Reasoning: A Quantitative Framework for Analysis and its Applications , 2006, Autonomous Agents and Multi-Agent Systems.

[54]  Milind Tambe,et al.  Solution sets for DCOPs and graphical games , 2006, AAMAS '06.

[55]  Milind Tambe,et al.  Agent Modelling in Partially Observable Domains , 2004 .

[56]  Milind Tambe,et al.  Implementation Techniques for Solving POMDPs in Personal Assistant Agents , 2005, PROMAS.

[57]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[58]  François Charpillet,et al.  A heuristic approach for solving decentralized-POMDP: assessment on the pursuit problem , 2002, SAC '02.

[59]  Milind Tambe,et al.  Implementation Techniques for Solving POMDPs in Personal Assistant Domains , 2005 .

[60]  Milind Tambe,et al.  Practical POMDPs for Personal Assistant Domains , 2005, AAAI Spring Symposium: Persistent Assistants: Living and Working with AI.

[61]  Milind Tambe,et al.  Valuations of Possible States (VPS): a quantitative framework for analysis of privacy loss among collaborative personal assistant agents , 2005, AAMAS '05.

[62]  Makoto Yokoo,et al.  Communications for improving policy computation in distributed POMDPs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[63]  Shlomo Zilberstein,et al.  Region-Based Incremental Pruning for POMDPs , 2004, UAI.

[64]  Claudia V. Goldman,et al.  Transition-independent decentralized markov decision processes , 2003, AAMAS '03.

[65]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[66]  Joelle Pineau,et al.  POMDP Planning for Robust Robot Control , 2005, ISRR.

[67]  Wei-Min Shen,et al.  A Dynamic Distributed Constraint Satisfaction Approach to Resource Allocation , 2001, CP.

[68]  Boi Faltings,et al.  A Scalable Method for Multiagent Constraint Optimization , 2005, IJCAI.