Adaptive exploration strategies for reinforcement learning

Reinforcement learning through an agent to learn policy use trial and error method to achieve the goal, but when we want to apply it in a real environment, how to dividing state space becomes difficult to decide, another problem in reinforcement learning, agent takes an action in the learning process according to the policy, we will encounter how to balance exploitation and exploration, to explore a new areas in order to gain experience, or to get the maximum reward on existing knowledge. To solve problems, we proposed the decision tree-based adaptive state space segmentation algorithm and then use decreasing Tabu search and adaptive exploration strategies to solve the problem of exploitation and exploration on this method. Decreasing Tabu search will put the action into the Tabu list, after agent take an action. If the Tabu list is full, release the action, but the size of Tabu list will decreasing according to the number of successful reaching goals. Adaptive exploration strategy is based on information entropy, not tuning exploration rate by manually. Finally, a maze environment simulation is used to validate the proposed method, further to decrease the learning time.

[1]  A.G. Parlos,et al.  A reinforcement learning method based on adaptive simulated annealing , 2003, 2003 46th Midwest Symposium on Circuits and Systems.

[2]  Felix Adelsbo Exploration and Exploitation in Reinforcement Learning , 2018 .

[3]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[4]  S. Mahadevan,et al.  Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .

[5]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.

[6]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Marco Wiering,et al.  Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Michel Tokic,et al.  Adaptive epsilon-Greedy Exploration in Reinforcement Learning Based on Value Difference , 2010, KI.

[10]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .