论文信息 - A new criterion using information gain for action selection strategy in reinforcement learning

A new criterion using information gain for action selection strategy in reinforcement learning

In this paper, we regard the sequence of returns as outputs from a parametric compound source. Utilizing the fact that the coding rate of the source shows the amount of information about the return, we describe /spl lscr/-learning algorithms based on the predictive coding idea for estimating an expected information gain concerning future information and give a convergence proof of the information gain. Using the information gain, we propose the ratio /spl omega/ of return loss to information gain as a new criterion to be used in probabilistic action-selection strategies. In experimental results, we found that our /spl omega/-based strategy performs well compared with the conventional Q-based strategy.

Kazushi Ikeda | Hideaki Sakai | Kazunori Iwata

[1] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[2] J. Rissanen. A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[3] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[4] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5] 韓太舜,et al. Mathematics of information and coding , 2002 .

[6] Makoto Sato,et al. Variance-Penalized Reinforcement Learning for Risk-Averse Asset Allocation , 2000, IDEAL.

[7] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[8] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.

[9] J. Rissanen. Stochastic Complexity and Modeling , 1986 .

[10] Makoto Sato,et al. Average-Reward Reinforcement Learning for Variance Penalized Markov Decision Problems , 2001, International Conference on Machine Learning.

[11] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.

[12] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[13] H. Akaike. A new look at the statistical model identification , 1974 .

[14] Aristidis Likas,et al. A Reinforcement Learning Approach to Online Clustering , 1999, Neural Computation.