The Improved Algorithm of Deep Q-learning Network Based on Eligibility Trace

At present, Deep Q-learning Network has become an important research direction in reinforcement learning. However, in practical application, the Deep Q-learning Network always overestimates the action value under certain conditions and has a high cost. In this paper, a new improved algorithm is proposed. The new improved algorithm uses the behavioral qualification tracking of each state for experience playback, so as to find the samples we need to learn more effectively. The behavior eligibility trace is considered in the process of Max calculation, and the problem of overestimation is solved effectively. In the process of optimizer training, trace fading is considered to effectively improve the learning effect and accelerate the convergence of the algorithm. The simulation results of different algorithms applied to the inverted pendulum system show that the new algorithm has better convergence effect, lower cost and lower overestimation than the Natural Deep Q-learning Network. The experimental results show that the new algorithm plays an active role in deep reinforcement learning and has a bright future.