Estimating the Maximum Expected Value through Upper Confidence Bound of Likelihood
暂无分享,去创建一个
[1] Yngvi Björnsson,et al. CadiaPlayer: A Simulation-Based General Game Player , 2009, IEEE Transactions on Computational Intelligence and AI in Games.
[2] T. Aven. Upper (lower) bounds on the mean of the maximum (minimum) of a number of random variables , 1985, Journal of Applied Probability.
[3] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[4] Julian Togelius,et al. Monte Mario: platforming with MCTS , 2014, GECCO.
[5] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[6] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[7] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[8] Tao Qin,et al. Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising , 2013, NIPS.
[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[10] Hado van Hasselt,et al. Estimating the Maximum Expected Value: An Analysis of (Nested) Cross Validation and the Maximum Sample Average , 2013, ArXiv.
[11] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.
[12] Marcello Restelli,et al. Estimating Maximum Expected Value through Gaussian Approximation , 2016, ICML.
[13] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.