论文信息 - Covering Number as a Complexity Measure for POMDP Planning and Learning

Covering Number as a Complexity Measure for POMDP Planning and Learning

Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or just "covering number", for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number.

[1] M. Littman. The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .

[2] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[3] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[4] John Langford,et al. Exploration in Metric State Spaces , 2003, ICML.

[5] Nan Rong,et al. What makes some POMDP problems easy to approximate? , 2007, NIPS.

[6] Peter Stone,et al. Learning Predictive State Representations , 2003, ICML.

[7] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[8] Trey Smith,et al. Probabilistic planning for robotic exploration , 2007 .

[9] Michael R. James,et al. Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.

[10] David Hsu,et al. SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[11] Christopher D. Manning,et al. Introduction to Information Retrieval: Hierarchical clustering , 2008 .

[12] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[13] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.

[14] Rui Xu,et al. Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[15] Lihong Li,et al. The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.

[16] Reid G. Simmons,et al. Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[17] Yishay Mansour,et al. Reinforcement Learning in POMDPs Without Resets , 2005, IJCAI.

[18] D. Hochbaum. Approximating covering and packing problems: set cover, vertex cover, independent set, and related problems , 1996 .