论文信息 - Reinforcement Learning for Golog Programs

Reinforcement Learning for Golog Programs

A special feature of programs in the action language Golog are non-deterministic actions, which require an agent to make choices during program execution. In the presence of stochastic actions and rewards, Finzi and Lukasiewicz have shown how to arrive at optimal choices using reinforcement learning techniques applied to the first-order MDP representations induced by the program. In this paper we extend their ideas in two ways: we adopt a first-order SMDP representation, which allows Q-updates to be limited to the non-deterministic choice points within a program, and we give a completely declarative specification of a learning Golog interpreter.

Gerhard Lakemeyer | G. Lakemeyer | Daniel Beck | Daniel Beck

[1] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[2] Craig Boutilier,et al. Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[3] Hector J. Levesque,et al. GOLOG: A Logic Programming Language for Dynamic Domains , 1997, J. Log. Program..

[4] Raymond Reiter,et al. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems , 2001 .

[5] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[6] David Andre,et al. Programmable Reinforcement Learning Agents , 2000, NIPS.

[7] Thomas Lukasiewicz,et al. Adaptive Multi-agent Programming in GTGolog , 2006, KI.

[8] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[9] Raymond Reiter,et al. The Frame Problem in the Situation Calculus: A Simple Solution (Sometimes) and a Completeness Result for Goal Regression , 1991, Artificial and Mathematical Theory of Computation.

[10] Hector J. Levesque,et al. ConGolog, a concurrent programming language based on the situation calculus , 2000, Artif. Intell..