Q-learning automaton

Reinforcement learning is the problem faced by a controller that must learn behavior through trial and error interactions with a dynamic environment. The controller's goal is to maximize reward over time, by producing an effective mapping of states of actions called policy. To construct the model of such systems, we present a generalized learning automaton approach with Q-learning behaviors. Compared to Q-learning, the computational experiments of the pursuit problems show that the proposed reinforcement scheme obtains better results in terms of convergence speed and memory size.