A new criterion using information gain for action selection strategy in reinforcement learning
暂无分享,去创建一个
[1] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[2] J. Rissanen. A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .
[3] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[4] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[5] 韓 太舜,et al. Mathematics of information and coding , 2002 .
[6] Makoto Sato,et al. Variance-Penalized Reinforcement Learning for Risk-Averse Asset Allocation , 2000, IDEAL.
[7] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[8] Harold J. Kushner,et al. Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.
[9] J. Rissanen. Stochastic Complexity and Modeling , 1986 .
[10] Makoto Sato,et al. Average-Reward Reinforcement Learning for Variance Penalized Markov Decision Problems , 2001, International Conference on Machine Learning.
[11] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[12] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..
[13] H. Akaike. A new look at the statistical model identification , 1974 .
[14] Aristidis Likas,et al. A Reinforcement Learning Approach to Online Clustering , 1999, Neural Computation.