Finding Memoryless Probabilistic Relational Policies for Inter-task Reuse

Relational representations let sequential decision problems be described through objects and relations, resulting in more compact, expressive, and domain-independent representations that make it possible to find and generalize solutions much faster than propositional representations. In this paper we propose a modified policy iteration algorithm (AbsProb-PI) for the infinite-horizon discounted-reward criterion; the algorithm finds a memoryless probabilistic relational abstract policy that abstracts well the solution from source problems so that it can be applied in new, similar problems. Experiments in robotic navigation validate our proposals and show that we can find effective and efficient abstract policies, outperforming solutions by inductive approaches in the literature.

[1]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[2]  M. van Otterlo Reinforcement Learning for Relational MDPs , 2004 .

[3]  Luc De Raedt,et al.  Relational Reinforcement Learning , 2001, Machine Learning.

[4]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[5]  Xi-Ren Cao,et al.  A Sensitivity View of Markov Decision Processes and Reinforcement Learning , 2003 .

[6]  Luc De Raedt,et al.  Bellman goes relational , 2004, ICML.

[7]  Scott Sanner,et al.  Practical solution techniques for first-order MDPs , 2009, Artif. Intell..

[8]  Fabio Gagliardi Cozman,et al.  Simultaneous Abstract and Concrete Reinforcement Learning , 2011, SARA.

[9]  Xi-Ren Cao,et al.  Gradient-based policy iteration: an example , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[10]  Hongsheng Xi,et al.  Finding optimal memoryless policies of POMDPs under the expected average reward criterion , 2011, Eur. J. Oper. Res..

[11]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[12]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[13]  Wolfram Burgard,et al.  Learning Relational Navigation Policies , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[15]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[16]  Luc De Raedt,et al.  Relational Reinforcement Learning , 1998, ILP.

[17]  John Loch,et al.  Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.

[18]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .