论文信息 - Q-Learning with Differential Entropy of Q-Tables

Q-Learning with Differential Entropy of Q-Tables

It is well-known that information loss can occur in the classic and simple Q-learning algorithm. Entropy-based policy search methods were introduced to replace Q-learning and to design algorithms that are more robust against information loss. We conjecture that the reduction in performance during prolonged training sessions of Q-learning is caused by a loss of information, which is non-transparent when only examining the cumulative reward without changing the Q-learning algorithm itself. We introduce Differential Entropy of Q-tables (DE-QT) as an external information loss detector to the Q-learning algorithm. The behaviour of DE-QT over training episodes is analyzed to find an appropriate stopping criterion during training. The results reveal that DE-QT can detect the most appropriate stopping point, where a balance between a high success rate and a high efficiency is met for classic Q-Learning algorithm.

Hussein A. Abbass | Kathryn E. Kasmarik | Tung D. Nguyen | H. Abbass | Tung D. Nguyen

[1] Takayuki Kanda,et al. Adapting Robot Behavior for Human--Robot Interaction , 2008, IEEE Transactions on Robotics.

[2] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[3] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[4] Darwin G. Caldwell,et al. Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[6] Shinzo Kitamura,et al. Q-Learning with adaptive state segmentation (QLASS) , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[7] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .

[8] Tao Mao,et al. Q-Tree: Automatic Construction of Hierarchical State Representation for Reinforcement Learning , 2012, ICIRA.

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10] Henry Y. K. Lau,et al. Adaptive state space partitioning for reinforcement learning , 2004, Eng. Appl. Artif. Intell..

[11] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.

[12] Majid Nili Ahmadabadi,et al. A Study on Expertise of Agents and Its Effects on Cooperative $Q$-Learning , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).