论文信息 - Point-based value iteration: an anytime algorithm for POMDPs - 字舞流文

Point-based value iteration: an anytime algorithm for POMDPs

In a seminal paper, Lin and Reiter introduced the notion of progression for basic action theories in the situation calculus. Unfortunately, progression is not first-order definable in general. Recently, Vassos, Lakemeyer, and Levesque showed that in case actions have only local effects, progression is first-order representable. However, they could show computability of the first-order representation only for a restricted class. Also, their proofs were quite involved. In this paper, we present a result stronger than theirs that for local-effect actions, progression is always first-order definable and computable. We give a very simple proof for this via the concept of forgetting. We also show first-order definability and computability results for a class of knowledge bases and actions with non-local effects. Moreover, for a certain class of local-effect actions and knowledge bases for representing disjunctive information, we show that progression is not only first-order definable but also efficiently computable.

Gerhard Lakemeyer | Yongmei Liu | S. Thrun | Joelle Pineau | G. Lakemeyer | G. Gordon | Yongmei Liu

[1] Kin Man Poon,et al. A fast heuristic algorithm for decision-theoretic planning , 2001 .

[2] Weihong Zhang,et al. Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[3] Gerhard Lakemeyer. Evaluation-Based Reasoning with Disjunctive Information in First-Order Knowledge Bases , 2002, KR.

[4] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .

[5] Hector J. Levesque. A Completeness Result for Reasoning with Incomplete First-Order Knowledge Bases , 1998, KR.

[6] Gerhard Lakemeyer,et al. First-Order Strong Progression for Local-Effect Basic Action Theories , 2008, KR.

[7] M. Rosencrantz,et al. Locating Moving Entities in Dynamic Indoor Environments with Teams of Mobile Robots , 2002 .

[8] Michael Gelfond,et al. Representing Action and Change by Logic Programs , 1993, J. Log. Program..

[9] Ronen I. Brafman,et al. A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.

[10] Changgeng Shao,et al. Reiter综合征 , 2003 .

[11] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[12] Patrick Doherty,et al. Computing Strongest Necessary and Weakest Sufficient Conditions of First-Order Formulas , 2001, IJCAI.

[13] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[14] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.

[15] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[16] William S. Lovejoy,et al. Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[17] Hector J. Levesque,et al. On the Progression of Situation Calculus Basic Action Theories: Resolving a 10-year-old Conjecture , 2008, AAAI.

[18] Nicholas Roy,et al. Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[19] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[20] Hector J. Levesque,et al. Tractable Reasoning in First-Order Knowledge Bases with Disjunctive Information , 2005, AAAI.

[21] Raymond Reiter,et al. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems , 2001 .

[22] Michael L. Littman,et al. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[23] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[24] Alex M. Andrew,et al. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems , 2002 .

[25] Louise Poissant. Part I , 1996, Leonardo.

[26] Fangzhen Lin,et al. How to Progress a Database , 1997, Artif. Intell..

[27] Hector J. Levesque,et al. GOLOG: A Logic Programming Language for Dynamic Domains , 1997, J. Log. Program..

[28] Craig Boutilier,et al. Value-Directed Compression of POMDPs , 2002, NIPS.

[29] Eric A. Hansen,et al. An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.

[30] Andrzej Szałas,et al. ELIMINATION OF PREDICATE QUANTIFIERS , 1999 .

[31] Gerhard Lakemeyer,et al. A Logic of Limited Belief for Reasoning with Disjunctive Information , 2004, KR.

[32] Sebastian Thrun,et al. The role of exploration in learning control , 1992 .