Bayesian Q-Learning
暂无分享,去创建一个
[1] Milton Abramowitz,et al. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .
[2] Ronald A. Howard,et al. Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..
[3] Irene A. Stegun,et al. Handbook of Mathematical Functions. , 1966 .
[4] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .
[5] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[6] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[7] Stuart J. Russell,et al. Do the right thing - studies in limited rationality , 1991 .
[8] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .
[9] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[10] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[11] Simon Parsons,et al. Do the right thing - studies in limited rationality by Stuart Russell and Eric Wefald, MIT Press, Cambridge, MA, £24.75, ISBN 0-262-18144-4 , 1994, The Knowledge Engineering Review.
[12] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[13] Jeremy Wyatt,et al. Exploration and inference in learning from reinforcement , 1998 .