Multi-Bandit Best Arm Identification
暂无分享,去创建一个
Alessandro Lazaric | Sébastien Bubeck | Mohammad Ghavamzadeh | Victor Gabillon | A. Lazaric | M. Ghavamzadeh | Sébastien Bubeck | Victor Gabillon
[1] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[2] Andrew W. Moore,et al. Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.
[3] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[4] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[5] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[6] Csaba Szepesvári,et al. Tuning Bandit Algorithms in Stochastic Environments , 2007, ALT.
[7] Csaba Szepesvári,et al. Empirical Bernstein stopping , 2008, ICML '08.
[8] Christos Dimitrakakis,et al. Rollout sampling approximate policy iteration , 2008, Machine Learning.
[9] Rémi Munos,et al. Pure Exploration for Multi-Armed Bandit Problems , 2008, ArXiv.
[10] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[11] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[12] Dominik D. Freydenberger,et al. Can We Learn to Gamble Efficiently? , 2010, COLT.
[13] Joelle Pineau,et al. Active learning for personalizing treatment , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).