Exponential Family PCA for Belief Compression in POMDPs

Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are intractable for large models. The intractability of these algorithms is due to a great extent to their generating an optimal policy over the entire belief space. However, in real POMDP problems most belief states are unlikely, and there is a structured, low-dimensional manifold of plausible beliefs embedded in the high-dimensional belief space. We introduce a new method for solving large-scale POMDPs by taking advantage of belief space sparsity. We reduce the dimensionality of the belief space by exponential family Principal Components Analysis [1], which allows us to turn the sparse, high-dimensional belief space into a compact, low-dimensional representation in terms of learned features of the belief state. We then plan directly on the low-dimensional belief features. By planning in a low-dimensional space, we can find policies for POMDPs that are orders of magnitude larger than can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and also on a mobile robot navigation task.

[1]  Sebastian Thrun,et al.  Coastal Navigation with Mobile Robots , 1999, NIPS.

[2]  Wolfram Burgard,et al.  Probabilistic Algorithms and the Interactive Museum Tour-Guide Robot Minerva , 2000, Int. J. Robotics Res..

[3]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[4]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[5]  Geoffrey E. Hinton,et al.  Global Coordination of Local Linear Models , 2001, NIPS.

[6]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[7]  I. Jolliffe Principal Component Analysis , 2002 .

[8]  Leslie Pack Kaelbling,et al.  Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[9]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[10]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[11]  Geoffrey J. Gordon Generalized^2 Linear^2 Models , 2002, NIPS 2002.

[12]  Geoffrey J. Gordon Generalized² Linear² Models , 2003, NIPS 2003.

[13]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[14]  Peter L. Bartlett,et al.  Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.

[15]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[16]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[17]  Andrew Y. Ng,et al.  Policy Search via Density Estimation , 1999, NIPS.

[18]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[19]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[20]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.