论文信息 - Entropy-based prioritized sampling in Deep Q-learning

Entropy-based prioritized sampling in Deep Q-learning

Online reinforcement agents take advantage of experience replay memory that allows them to reuse experiences from the past to re-learn, thus improving the overall efficiency of the learning process. Prioritizing on specific transitions during the sampling and replay increased the performance of learning even more, but in previous approaches the priority of the transitions was determined only by its TD error property. In this work, we introduce a novel criterion for evaluating the importance of the transition which is based on the Shannon's entropy of the agents perceived state space. Furthermore, we compare the performance of different criteria for prioritizing on one of the simulation environments included in REinforcejs framework. Experimental results show that DQ-ETD which uses a combination of entropy and TD error criterion outperforms the approaches based on the TD error criterion only such as DQ-TD.

Andrea Bonarini | Mirza Ramicic

[1] Peng Zhang,et al. Deep Q-Learning with Prioritized Sampling , 2016, ICONIP.

[2] Regina Barzilay,et al. Language Understanding for Text-based Games using Deep Reinforcement Learning , 2015, EMNLP.

[3] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[4] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[5] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[6] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[7] Radford M. Neal. Annealed importance sampling , 1998, Stat. Comput..

[8] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.