Q-Learning with Differential Entropy of Q-Tables

It is well-known that information loss can occur in the classic and simple Q-learning algorithm. Entropy-based policy search methods were introduced to replace Q-learning and to design algorithms that are more robust against information loss. We conjecture that the reduction in performance during prolonged training sessions of Q-learning is caused by a loss of information, which is non-transparent when only examining the cumulative reward without changing the Q-learning algorithm itself. We introduce Differential Entropy of Q-tables (DE-QT) as an external information loss detector to the Q-learning algorithm. The behaviour of DE-QT over training episodes is analyzed to find an appropriate stopping criterion during training. The results reveal that DE-QT can detect the most appropriate stopping point, where a balance between a high success rate and a high efficiency is met for classic Q-Learning algorithm.

[1]  Takayuki Kanda,et al.  Adapting Robot Behavior for Human--Robot Interaction , 2008, IEEE Transactions on Robotics.

[2]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[3]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[4]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[6]  Shinzo Kitamura,et al.  Q-Learning with adaptive state segmentation (QLASS) , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[7]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[8]  Tao Mao,et al.  Q-Tree: Automatic Construction of Hierarchical State Representation for Reinforcement Learning , 2012, ICIRA.

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  Henry Y. K. Lau,et al.  Adaptive state space partitioning for reinforcement learning , 2004, Eng. Appl. Artif. Intell..

[11]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[12]  Majid Nili Ahmadabadi,et al.  A Study on Expertise of Agents and Its Effects on Cooperative $Q$-Learning , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).