Conservative Bandits
暂无分享,去创建一个
Yifan Wu | Tor Lattimore | Csaba Szepesvári | Roshan Shariff | Csaba Szepesvari | Tor Lattimore | Yifan Wu | R. Shariff
[1] Marcus Hutter,et al. Adaptive Online Prediction by Following the Perturbed Leader , 2005, J. Mach. Learn. Res..
[2] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[3] Aurélien Garivier,et al. Informational confidence bounds for self-normalized averages and applications , 2013, 2013 IEEE Information Theory Workshop (ITW).
[4] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..
[5] Tor Lattimore,et al. The Pareto Regret Frontier for Bandits , 2015, NIPS.
[6] Oliver Lemon,et al. Learning Effective Multimodal Dialogue Strategies from Wizard-of-Oz Data: Bootstrapping and Evaluation , 2008, ACL.
[7] Gergely Neu,et al. Explore no more: Improved high-probability regret bounds for non-stochastic bandits , 2015, NIPS.
[8] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[9] Alessandro Lazaric,et al. Exploiting easy data in online optimization , 2014, NIPS.
[10] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[11] Yishay Mansour,et al. Regret to the best vs. regret to the average , 2007, Machine Learning.
[12] Matthew Malloy,et al. lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.
[13] Martin A. Riedmiller,et al. Distributed policy search reinforcement learning for job-shop scheduling tasks , 2012 .
[14] Zoran Popovic,et al. Towards automatic experimentation of educational knowledge , 2014, CHI.
[15] Wouter M. Koolen. The Pareto Regret Frontier , 2013, NIPS.
[16] Tor Lattimore,et al. Optimally Confident UCB : Improved Regret for Finite-Armed Bandits , 2015, ArXiv.
[17] Alkis Gotovos,et al. Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.
[18] H Robbins,et al. Sequential choice from several populations. , 1995, Proceedings of the National Academy of Sciences of the United States of America.