A Closer Look at MOMDPs

The difficulties encountered in sequential decision-making problems under uncertainty are often linked to the large size of the state space. Exploiting the structure of the problem, for example by employing a factored representation, is usually an efficient approach but, in the case of partially observable Markov decision processes, the fact that some state variables may be visible has not been sufficiently appreciated. In this article, we present a complementary analysis and discussion about MOMDPs, a formalism that exploits the fact that the state space may be factored in one visible part and one hidden part. Starting from a POMDP description, we dig into the structure of the belief update, value function, and the consequences in value iteration, specifically how classical algorithms can be adapted to this factorization, and demonstrate the resulting benefits through an empirical evaluation.

[1]  Zhengzhu Feng,et al.  Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.

[2]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[3]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[4]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[5]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[6]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[7]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[8]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[9]  David Hsu,et al.  POMDPs for robotic tasks with mixed observability , 2009, Robotics: Science and Systems.

[10]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[12]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[13]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[14]  Eric A. Hansen,et al.  An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.

[15]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[16]  Weihong Zhang,et al.  Restricted Value Iteration: Theory and Algorithms , 2011, J. Artif. Intell. Res..

[17]  A. Cassandra A Survey of POMDP Applications , 2003 .

[18]  Pascal Poupart,et al.  Partially Observable Markov Decision Processes , 2010, Encyclopedia of Machine Learning.

[19]  Ronald A. Howard,et al.  Readings on the Principles and Applications of Decision Analysis , 1989 .

[20]  Nan Rong,et al.  What makes some POMDP problems easy to approximate? , 2007, NIPS.

[21]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .