Estimating Maximum Expected Value through Gaussian Approximation
暂无分享,去创建一个
Marcello Restelli | Carlo D'Eramo | Alessandro Nuara | Marcello Restelli | Carlo D'Eramo | Alessandro Nuara
[1] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[2] D. BhaeiyalIshwaei,et al. Non-existence of unbiased estimators of ordered parameters , 1985 .
[3] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[4] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[5] Tao Qin,et al. Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising , 2013, NIPS.
[6] Warren B. Powell,et al. An Intelligent Battery Controller Using Bias-Corrected Q-learning , 2012, AAAI.
[7] A. Cohen,et al. ESTIMATION OF THE LARGER OF TWO NORMAL MEANS , 1968 .
[8] Warren B. Powell,et al. Bias-corrected Q-learning to control max-operator bias in Q-learning , 2013, 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[9] Hado van Hasselt,et al. Estimating the Maximum Expected Value: An Analysis of (Nested) Cross Validation and the Maximum Sample Average , 2013, ArXiv.
[10] E. Steen. Rational Overoptimism (and Other Biases) , 2004 .
[11] Robert L. Winkler,et al. The Optimizer's Curse: Skepticism and Postdecision Surprise in Decision Analysis , 2006, Manag. Sci..
[12] M. Stone. Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .