Fighting Bandits with a New Kind of Smoothness
暂无分享,去创建一个
[1] D. Bertsekas. Stochastic optimization problems with nondifferentiable cost functionals , 1973 .
[2] Ohad Shamir,et al. Relax and Randomize : From Value to Algorithms , 2012, NIPS.
[3] Rémi Munos,et al. Efficient learning by implicit exploration in bandit problems with side observations , 2014, NIPS.
[4] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .
[5] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[6] Guy Van den Broeck,et al. Monte-Carlo Tree Search in Poker Using Expected Reward Distributions , 2009, ACML.
[7] Ambuj Tewari,et al. Online Linear Optimization via Smoothing , 2014, COLT.
[8] J. Corcoran. Modelling Extremal Events for Insurance and Finance , 2002 .
[9] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[10] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[11] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.
[12] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[13] Luc Devroye,et al. Prediction by random-walk perturbation , 2013, COLT.
[14] Wojciech Kotlowski,et al. Follow the Leader with Dropout Perturbations , 2014, COLT.
[15] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[16] Gergely Neu,et al. An Efficient Algorithm for Learning with Semi-bandit Feedback , 2013, ALT.
[17] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[18] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[19] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[20] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[21] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[22] Avrim Blum,et al. Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.
[23] Elad Hazan,et al. Interior-Point Methods for Full-Information and Bandit Online Learning , 2012, IEEE Transactions on Information Theory.
[24] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[25] PAUL EMBRECHTS,et al. Modelling of extremal events in insurance and finance , 1994, Math. Methods Oper. Res..
[26] Jean-Paul Penot,et al. Sub-hessians, super-hessians and conjugation , 1994 .
[27] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[28] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.
[29] John Gittins,et al. Quantitative Methods in the Planning of Pharmaceutical Research , 1996 .
[30] C. Tsallis. Possible generalization of Boltzmann-Gibbs statistics , 1988 .
[31] Gábor Lugosi,et al. Minimax Policies for Combinatorial Prediction Games , 2011, COLT.
[32] Thomas P. Hayes,et al. Robbing the bandit: less regret in online geometric optimization against an adaptive adversary , 2006, SODA '06.
[33] C. Klüppelberg,et al. Modelling Extremal Events , 1997 .
[34] Una-May O'Reilly,et al. Hyperparameter Tuning in Bandit-Based Adaptive Operator Selection , 2012, EvoApplications.
[35] Tapio Elomaa,et al. On Following the Perturbed Leader in the Bandit Setting , 2005, ALT.
[36] H. Robbins. Some aspects of the sequential design of experiments , 1952 .