Exploration Entropy for Reinforcement Learning

The training process analysis and termination condition of the training process of a Reinforcement Learning (RL) system have always been the key issues to train an RL agent. In this paper, a new approach based on State Entropy and Exploration Entropy is proposed to analyse the training process. The concept of State Entropy is used to denote the uncertainty for an RL agent to select the action at every state that the agent will traverse, while the Exploration Entropy denotes the action selection uncertainty of the whole system. Actually, the action selection uncertainty of a certain state or the whole system reflects the degree of exploration and the stage of the learning process for an agent. The Exploration Entropy is a new criterion to analyse and manage the training process of RL. The theoretical analysis and experiment results illustrate that the curve of Exploration Entropy contains more information than the existing analytical methods.

[1]  X. Zhuang,et al.  The Strategy Entropy of Reinforcement Learning for Mobile Robot Navigation in Complex Environments , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[2]  Li Xia,et al.  Complexity analysis of reinforcement learning and its application to robotics , 2017, 2017 13th IEEE Conference on Automation Science and Engineering (CASE).

[3]  James Lam,et al.  Control Design of Uncertain Quantum Systems With Fuzzy Estimators , 2012, IEEE Transactions on Fuzzy Systems.

[4]  Li Hanxiong,et al.  Reinforcement Strategy Using Quantum Amplitude Amplification for Robot Learning , 2006, 2007 Chinese Control Conference.

[5]  Domenico D'Alessandro,et al.  Optimal control of two-level quantum systems , 2001, IEEE Trans. Autom. Control..

[6]  Andrea Bonarini,et al.  Entropy-based prioritized sampling in Deep Q-learning , 2017, 2017 2nd International Conference on Image, Vision and Computing (ICIVC).

[7]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[8]  Gabriela Ciuperca,et al.  Estimation of the Entropy Rate of a Countable Markov Chain , 2007 .

[9]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[10]  Néstor Becerra Yoma,et al.  Maximum Entropy-Based Reinforcement Learning Using a Confidence Measure in Speech Recognition for Telephone Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Daoyi Dong,et al.  Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12]  M. V. C. Guelpeli,et al.  The apprentice modeling through reinforcement with a temporal analysis using the Q-learning algorithm , 2012, 2012 IEEE International Conference on Computer Science and Automation Engineering (CSAE).

[13]  Jian Chen,et al.  Multiagent-Based Simulation of Temporal-Spatial Characteristics of Activity-Travel Patterns Using Interactive Reinforcement Learning , 2014 .

[14]  Yongqian Li,et al.  Reinforcement Learning Based Novel Adaptive Learning Framework for Smart Grid Prediction , 2017 .

[15]  Herschel Rabitz,et al.  Quantum control landscapes , 2007, 0710.0684.

[16]  Chunlin Chen,et al.  Knowledge Transfer between Multi-granularity Models for Reinforcement Learning , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[17]  Qiang Luo,et al.  The Study of Reinforcement Learning for Traffic Self-Adaptive Control under Multiagent Markov Game Environment , 2013 .

[18]  N. Limnios,et al.  Entropy Rate and Maximum Entropy Methods for Countable Semi-Markov Chains , 2004 .

[19]  Ian R. Petersen,et al.  Quantum control theory and applications: A survey , 2009, IET Control Theory & Applications.

[20]  Tzyh Jong Tarn,et al.  Fidelity-Based Probabilistic Q-Learning for Control of Quantum Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Paulo E. M. F. Mendonca,et al.  Quantum Control of a Single Qubit , 2006, quant-ph/0608037.

[22]  Lei Cao,et al.  Ensemble Network Architecture for Deep Reinforcement Learning , 2018 .

[23]  Xiaodong Zhuang,et al.  Strategy Entropy as a Measure of Strategy Convergence in Reinforcement Learning , 2008, 2008 First International Conference on Intelligent Networks and Intelligent Systems.

[24]  Loïck Lhote,et al.  Computation and Estimation of Generalized Entropy Rates for Denumerable Markov Chains , 2011, IEEE Transactions on Information Theory.

[25]  Yang Gao,et al.  Multiagent Reinforcement Learning With Sparse Interactions by Negotiation and Knowledge Transfer , 2015, IEEE Transactions on Cybernetics.

[26]  Tzyh Jong Tarn,et al.  Quantum Reinforcement Learning , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[27]  Daoyi Dong,et al.  Incremental Reinforcement Learning With Prioritized Sweeping for Dynamic Environments , 2019, IEEE/ASME Transactions on Mechatronics.

[28]  Kyungjae Lee,et al.  Sparse Markov Decision Processes With Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning , 2018, IEEE Robotics and Automation Letters.

[29]  Mohammad Reza Meybodi,et al.  Utilizing Learning Automata and Entropy to Improve the Exploration Power of Rescue Agents , 2010, 2010 Second WRI Global Congress on Intelligent Systems.

[30]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[31]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32]  Daoyi Dong,et al.  Complexity analysis of Quantum reinforcement learning , 2010, Proceedings of the 29th Chinese Control Conference.

[33]  Philip S. Thomas,et al.  Training an Actor-Critic Reinforcement Learning Controller for Arm Movement Using Human-Generated Rewards , 2017, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[34]  Sami Bourouis,et al.  Entropy-based variational Bayes learning framework for data clustering , 2018, IET Image Process..

[35]  Chunlin Chen,et al.  An event-based probabilistic Q-learning method for navigation control of mobile robots , 2014, Proceeding of the 11th World Congress on Intelligent Control and Automation.

[36]  Thierry Paul,et al.  Quantum computation and quantum information , 2007, Mathematical Structures in Computer Science.