An Adaptive Approach for the Exploration-Exploitation Dilemma in Non-stationary Environment

A central problem in reinforcement learning is balancing exploration-exploitation dilemma in non-stationary environment. To address this problem, a data-driven Q-learning is presented. In this study, firstly, the information system of behavior is formed by experience of agent. Then the trigger mechanism of environment is constructed to trace changes of environment by uncertain knowledge of information system. The dynamic information of environment is used to balance exploration-exploitation dilemma with self-driven way. We illustrated this algorithm with grid-world navigation tasks. The results of simulated experiments show that this algorithm improves learning efficiency obviously.

[1]  Yang Liu,et al.  A new Q-learning algorithm based on the metropolis criterion , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[3]  James F. Peters,et al.  Approximation spaces in off-policy Monte Carlo learning , 2007, Eng. Appl. Artif. Intell..

[4]  Wei Pan,et al.  The Two Facets of the Exploration-Exploitation Dilemma , 2006, 2006 IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[5]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[6]  Jürgen Schmidhuber,et al.  Efficient model-based exploration , 1998 .