Reinforcement Learning Algorithms for MDPs

Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' merits and limitations. In this article we focus on a few selected algorithms of reinforcement learning which build on the powerful theory of dynamic programming. Keywords: reinforcement learning; Markov Decision Processes; temporal difference learning; stochastic approximation; function approximation; least-squares methods; Q-learning; actor-critic methods; policy gradient; natural gradient

