Optimal Simple Regret in Bayesian Best Arm Identification
暂无分享,去创建一个
Junpei Komiyama | Kaito Ariu | Masahiro Kato | Chao Qin | Junpei Komiyama | Kaito Ariu | Chao Qin | Masahiro Kato
[1] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[2] Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..
[3] Yu-Chi Ho,et al. Ordinal optimization of DEDS , 1992, Discret. Event Dyn. Syst..
[4] Akimichi Takemura,et al. Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards , 2015, J. Mach. Learn. Res..
[5] Emilie Kaufmann,et al. Analysis of bayesian and frequentist strategies for sequential resource allocation. (Analyse de stratégies bayésiennes et fréquentistes pour l'allocation séquentielle de ressources) , 2014 .
[6] Peter I. Frazier,et al. A Tutorial on Bayesian Optimization , 2018, ArXiv.
[7] Michal Valko,et al. Fixed-Confidence Guarantees for Bayesian Best-Arm Identification , 2019, AISTATS.
[8] R. Weber. On the Gittins Index for Multiarmed Bandits , 1992 .
[9] Diego Klabjan,et al. Improving the Expected Improvement Algorithm , 2017, NIPS.
[10] Daniel Russo,et al. A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents , 2019, Oper. Res..
[11] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[12] Walter T. Federer,et al. Sequential Design of Experiments , 1967 .
[13] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .
[14] Alexandra Carpentier,et al. Tight (Lower) Bounds for the Fixed Budget Best Arm Identification Bandit Problem , 2016, COLT.
[15] Rémi Munos,et al. Thompson Sampling: An Optimal Finite Time Analysis , 2012, ArXiv.
[16] Chun-Hung Chen,et al. Simulation Budget Allocation for Further Enhancing the Efficiency of Ordinal Optimization , 2000, Discret. Event Dyn. Syst..
[17] Sattar Vakili,et al. Optimal Order Simple Regret for Gaussian Process Bandits , 2021, NeurIPS.
[18] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[19] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[20] Adam D. Bull,et al. Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..
[21] Ole-Christoffer Granmo,et al. A Bayesian Learning Automaton for Solving Two-Armed Bernoulli Bandit Problems , 2008, 2008 Seventh International Conference on Machine Learning and Applications.
[22] Dominik D. Freydenberger,et al. Can We Learn to Gamble Efficiently? , 2010, COLT.
[23] Ilya O. Ryzhov,et al. On the Convergence Rates of Expected Improvement Methods , 2016, Oper. Res..
[24] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[25] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..
[26] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[27] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[28] Daniel Russo,et al. Simple Bayesian Algorithms for Best Arm Identification , 2016, COLT.
[29] Andrew W. Moore,et al. The Racing Algorithm: Model Selection for Lazy Learners , 1997, Artificial Intelligence Review.
[30] Peter W. Glynn,et al. A large deviations perspective on ordinal optimization , 2004, Proceedings of the 2004 Winter Simulation Conference, 2004..
[31] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[32] E. Paulson. A Sequential Procedure for Selecting the Population with the Largest Mean from $k$ Normal Populations , 1964 .