Point-based value iteration: an anytime algorithm for POMDPs

In a seminal paper, Lin and Reiter introduced the notion of progression for basic action theories in the situation calculus. Unfortunately, progression is not first-order definable in general. Recently, Vassos, Lakemeyer, and Levesque showed that in case actions have only local effects, progression is first-order representable. However, they could show computability of the first-order representation only for a restricted class. Also, their proofs were quite involved. In this paper, we present a result stronger than theirs that for local-effect actions, progression is always first-order definable and computable. We give a very simple proof for this via the concept of forgetting. We also show first-order definability and computability results for a class of knowledge bases and actions with non-local effects. Moreover, for a certain class of local-effect actions and knowledge bases for representing disjunctive information, we show that progression is not only first-order definable but also efficiently computable.

[1]  Kin Man Poon,et al.  A fast heuristic algorithm for decision-theoretic planning , 2001 .

[2]  Weihong Zhang,et al.  Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[3]  Gerhard Lakemeyer Evaluation-Based Reasoning with Disjunctive Information in First-Order Knowledge Bases , 2002, KR.

[4]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[5]  Hector J. Levesque A Completeness Result for Reasoning with Incomplete First-Order Knowledge Bases , 1998, KR.

[6]  Gerhard Lakemeyer,et al.  First-Order Strong Progression for Local-Effect Basic Action Theories , 2008, KR.

[7]  M. Rosencrantz,et al.  Locating Moving Entities in Dynamic Indoor Environments with Teams of Mobile Robots , 2002 .

[8]  Michael Gelfond,et al.  Representing Action and Change by Logic Programs , 1993, J. Log. Program..

[9]  Ronen I. Brafman,et al.  A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.

[10]  Changgeng Shao,et al.  Reiter综合征 , 2003 .

[11]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[12]  Patrick Doherty,et al.  Computing Strongest Necessary and Weakest Sufficient Conditions of First-Order Formulas , 2001, IJCAI.

[13]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[14]  Peter L. Bartlett,et al.  Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.

[15]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[16]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[17]  Hector J. Levesque,et al.  On the Progression of Situation Calculus Basic Action Theories: Resolving a 10-year-old Conjecture , 2008, AAAI.

[18]  Nicholas Roy,et al.  Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[19]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[20]  Hector J. Levesque,et al.  Tractable Reasoning in First-Order Knowledge Bases with Disjunctive Information , 2005, AAAI.

[21]  Raymond Reiter,et al.  Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems , 2001 .

[22]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[23]  Yishay Mansour,et al.  Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[24]  Alex M. Andrew,et al.  Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems , 2002 .

[25]  Louise Poissant Part I , 1996, Leonardo.

[26]  Fangzhen Lin,et al.  How to Progress a Database , 1997, Artif. Intell..

[27]  Hector J. Levesque,et al.  GOLOG: A Logic Programming Language for Dynamic Domains , 1997, J. Log. Program..

[28]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[29]  Eric A. Hansen,et al.  An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.

[30]  Andrzej Szałas,et al.  ELIMINATION OF PREDICATE QUANTIFIERS , 1999 .

[31]  Gerhard Lakemeyer,et al.  A Logic of Limited Belief for Reasoning with Disjunctive Information , 2004, KR.

[32]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .