Optimism in the face of uncertainty supported by a statistically-designed multi-armed bandit algorithm
暂无分享,去创建一个
[1] Moto Kamiura,et al. Overtaking method based on sand-sifter mechanism: Why do optimistic value functions find optimal solutions in multi-armed bandit problems? , 2015, Biosyst..
[2] Moto Kamiura,et al. Overtaking Method based on Variance of Values: Resolving the Exploration–Exploitation Dilemma , 2013 .
[3] James A. Shepperd,et al. Exploring the causes of comparative optimism. , 2002 .
[4] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[5] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[6] A. Agresti,et al. Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions , 1998 .
[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[8] Timothy D. Ross,et al. Accurate confidence intervals for binomial proportion and Poisson rate estimation , 2003, Comput. Biol. Medicine.
[9] Nathan R. Sturtevant,et al. An Analysis of UCT in Multi-Player Games , 2008, J. Int. Comput. Games Assoc..
[10] E. B. Wilson. Probable Inference, the Law of Succession, and Statistical Inference , 1927 .
[11] Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..
[12] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[13] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[14] R. Newcombe. Two-sided confidence intervals for the single proportion: comparison of seven methods. , 1998, Statistics in medicine.
[15] Sylvain Gelly,et al. Exploration exploitation in Go: UCT for Monte-Carlo Go , 2006, NIPS 2006.
[16] J. Reiczigel,et al. Confidence intervals for the binomial parameter: some new considerations , 2003, Statistics in medicine.
[17] Sean Wallis,et al. Binomial Confidence Intervals and Contingency Tests: Mathematical Fundamentals and the Evaluation of Alternative Methods , 2013, J. Quant. Linguistics.
[18] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..