暂无分享,去创建一个
[1] Ohad Shamir,et al. Bandit Regret Scaling with the Effective Loss Range , 2017, ALT.
[2] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[3] Peter Auer,et al. Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring , 2006, ALT.
[4] Tor Lattimore,et al. Exploration by Optimisation in Partial Monitoring , 2019, COLT.
[5] Haipeng Luo,et al. Improved Path-length Regret Bounds for Bandits , 2019, COLT.
[6] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[7] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[8] Haipeng Luo,et al. More Adaptive Algorithms for Adversarial Bandits , 2018, COLT.
[9] Gábor Lugosi,et al. Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..
[10] Claudio Gentile,et al. Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..
[11] Elad Hazan,et al. Better Algorithms for Benign Bandits , 2009, J. Mach. Learn. Res..
[12] Francesco Orabona. A Modern Introduction to Online Learning , 2019, ArXiv.
[13] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[14] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[15] Wouter M. Koolen,et al. Adaptive Hedge , 2011, NIPS.
[16] Jürgen Schmidhuber,et al. Algorithm portfolio selection as a bandit problem with unbounded losses , 2011, Annals of Mathematics and Artificial Intelligence.
[17] Julian Zimmert,et al. Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits , 2018, J. Mach. Learn. Res..
[18] Gábor Lugosi,et al. Minimax Policies for Combinatorial Prediction Games , 2011, COLT.
[19] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[20] Tor Lattimore,et al. On First-Order Bounds, Variance and Gap-Dependent Bounds for Adversarial Bandits , 2019, UAI.
[21] Yuanzhi Li,et al. Sparsity, variance and curvature in multi-armed bandits , 2017, ALT.
[22] Francesco Orabona,et al. Scale-free online learning , 2016, Theor. Comput. Sci..
[23] Éva Tardos,et al. Learning in Games: Robustness of Fast Convergence , 2016, NIPS.
[24] Aleksandrs Slivkins,et al. Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..
[25] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[26] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[27] András György,et al. A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds , 2017, ALT.
[28] Tor Lattimore,et al. Refined Lower Bounds for Adversarial Bandits , 2016, NIPS.
[29] Csaba Szepesvári,et al. A modular analysis of adaptive (non-)convex optimization: Optimism, composite objectives, variance reduction, and variational bounds , 2020, Theor. Comput. Sci..
[30] Wouter M. Koolen,et al. Follow the leader if you can, hedge if you must , 2013, J. Mach. Learn. Res..