Strategy Entropy as a Measure of Strategy Convergence in Reinforcement Learning

The concept of entropy is introduced into reinforcement learning. The definitions of the local and global strategy entropy are presented. The global strategy entropy is experimentally proved to be the quantitative problem-independent measure of the strategypsilas convergence degree. The experimental results show that the learning based on the local strategy entropy improves the learning performance.

[1]  Sridhar Mahadevan,et al.  Optimizing Production Manufacturing Using Reinforcement Learning , 1998, FLAIRS.

[2]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[3]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[4]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[5]  Madalena Costa,et al.  Multiscale entropy analysis of complex physiologic time series. , 2002, Physical review letters.

[6]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[7]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[8]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[9]  Larry D. Pyeatt,et al.  Decision Tree Function Approximation in Reinforcement Learning , 1999 .

[10]  Luc De Raedt,et al.  Relational Reinforcement Learning , 2001, Machine Learning.

[11]  D. Ernst,et al.  Power systems stability control: reinforcement learning framework , 2004, IEEE Transactions on Power Systems.

[12]  Sridhar Mahadevan,et al.  Hierarchical Multiagent Reinforcement Learning , 2004 .

[13]  E. Lieb,et al.  The physics and mathematics of the second law of thermodynamics (Physics Reports 310 (1999) 1–96)☆ , 1997, cond-mat/9708200.

[14]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[15]  Timothy X. Brown,et al.  Low Power Wireless Communication via Reinforcement Learning , 1999, NIPS.