Finding good stochastic factored policies for factored Markov decision processes

We propose a framework for approximate resolution of MDPs with factored state space, factored action space and additive reward, based on (i) considering stochastic factored policies (SFPs) with a given structure, (ii) using variational approximations to estimate SFP values and (iii) using local continuous optimization algorithms to compute "good" SFPs. We have implemented and tested an algorithm (CA-LBP), involving a loopy belief propagation algorithm and a coordinate ascent procedure. Experiments show that CA-LBP performs as well as a state-of-the-art algorithm dedicated to a specific sub-class of FA-FMDPs, and that CA-LBP can be applied to general FA-FMDPs with up to 100 binary state variables and 100 binary action variables.