Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs
暂无分享,去创建一个
[1] M. Mohri,et al. Stochastic Online Learning with Feedback Graphs: Finite-Time and Asymptotic Optimality , 2022, NeurIPS.
[2] J. Honda,et al. Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs , 2022, NeurIPS.
[3] Chihao Zhang,et al. Understanding Bandits with Graph Feedback , 2021, NeurIPS.
[4] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[5] Haipeng Luo,et al. Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs , 2020, NeurIPS.
[6] Haipeng Luo,et al. A Closer Look at Small-loss Bounds for Bandits with Graph Feedback , 2020, COLT.
[7] Éva Tardos,et al. Small-loss bounds for online learning with partial information , 2017, COLT.
[8] Fang Liu,et al. Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks , 2017, J. Mach. Learn. Res..
[9] Tomer Koren,et al. Online Learning with Feedback Graphs Without the Graphs , 2016, ICML.
[10] Gergely Neu,et al. Explore no more: Improved high-probability regret bounds for non-stochastic bandits , 2015, NIPS.
[11] N. Alon,et al. Online Learning with Feedback Graphs: Beyond Bandits , 2015, COLT.
[12] Rémi Munos,et al. Efficient learning by implicit exploration in bandit problems with side observations , 2014, NIPS.
[13] Noga Alon,et al. Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback , 2014, SIAM J. Comput..
[14] Marc Lelarge,et al. Leveraging Side Observations in Stochastic Bandits , 2012, UAI.
[15] Shie Mannor,et al. From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.
[16] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.
[17] Jacob D. Abernethy,et al. Beating the adaptive bandit with high probability , 2009, 2009 Information Theory and Applications Workshop.
[18] Thomas P. Hayes,et al. High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.
[19] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[20] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.
[21] Tor Lattimore,et al. Return of the bias: Almost minimax optimal high probability bounds for adversarial linear bandits , 2022, COLT.
[22] Tomer Koren,et al. Towards Best-of-All-Worlds Online Learning with Feedback Graphs , 2021, NeurIPS.
[23] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..