Learning a Belief Representation for Delayed Reinforcement Learning
暂无分享,去创建一个
This paper considers sequential decision-making problems where the interactions between an agent and its environment are affected by delays. Delays may be present in the state observation, in the action execution, or in the reward collection. We consider the delayed Markov Decision Process (MDP) framework both in the case of deterministic and stochastic delays. Given the hardness of the delayed MDP problem, we use a heuristic approach to design an algorithm that uses the belief over the current unobserved state to select its action. We design a self-attention prediction module which, given the last observed state and the following sequence of actions, estimates the beliefs over the following states. This algorithm is able to deal with deterministic delays and could potentially be extended to stochastic delays. We empirically evaluate the effectiveness of the proposed approach in both deterministic and stochastic control problems affected by deterministic delays.