Learning effective state-feedback controllers through efficient multilevel importance samplers

ABSTRACT Monte Carlo sampling can be used to estimate the solution of path integral control problems, which are a restricted class of nonlinear control problems with arbitrary dynamics and state cost, but with a linear dependence of the control on the dynamics and quadratic control cost. Although importance sampling is used to improve numerical computations, the effective sample size may still be low or many samples could be required. In this work, we propose a method to learn effective state-feedback controllers for nonlinear stochastic control problems based on multilevel importance samplers. In particular, we focus on the question of how to compute effective importance samplers considering a multigrid scenario. We test our algorithm in finite horizon control problems based on Lorenz-96 model with chaotic and non-chaotic behaviour, showing, in all cases, that our multigrid implementation reduces the computational time and improves the effective sample size.