Reinforcement Learning for Golog Programs

A special feature of programs in the action language Golog are non-deterministic actions, which require an agent to make choices during program execution. In the presence of stochastic actions and rewards, Finzi and Lukasiewicz have shown how to arrive at optimal choices using reinforcement learning techniques applied to the first-order MDP representations induced by the program. In this paper we extend their ideas in two ways: we adopt a first-order SMDP representation, which allows Q-updates to be limited to the non-deterministic choice points within a program, and we give a completely declarative specification of a learning Golog interpreter.