The apprentice modeling through reinforcement with a temporal analysis using the Q-learning algorithm

This work aims to create the simulations by varying the alpha (a - Learning rate) and Gamma (y - Time reduction) values, such parameters found in the q-learning algorithm, which is possible to analyze the algorithms convergence, on what concerns the variations of these parameters. This work seeks to state that the parameters variations of Alpha and Gamma interfere on the convergence of Q-learning algorithm, thus, in the ITS learning.