Asymptotic behaviour of a learning algorithm

The paper considers a learning automaton operating in a stationary random environment. The automaton has multiple actions and updates its action probability vector according to the linear reward — ϵ penalty (LR-ϵp) algorithm. Using weak convergence concepts it is shown that for large time and small values of parameters in the algorithm, the evolution of the action probability can be represented by Gauss-Markov diffusion.