Exploiting Additive Structure in Factored MDPs for Reinforcement Learning

sdyna is a framework able to address large, discrete and stochastic reinforcement learning problems. It incrementally learns a fmdp representing the problem to solve while using fmdp planning techniques to build an efficient policy. spiti , an instantiation of sdyna , uses a planning method based on dynamic programming which cannot exploit the additive structure of a fmdp . In this paper, we present two new instantiations of sdyna , namely ulp and unatlp , using a linear programming based planning method that can exploit the additive structure of a fmdp and address problems out of reach of spiti .