Graph regret bounds for Thompson Sampling and UCB
暂无分享,去创建一个
[1] Shipra Agrawal,et al. Near-Optimal Regret Bounds for Thompson Sampling , 2017, J. ACM.
[2] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[3] Fang Liu,et al. Analysis of Thompson Sampling for Graphical Bandits Without the Graphs , 2018, UAI.
[4] Atilla Eryilmaz,et al. Stochastic bandits with side observations on networks , 2014, SIGMETRICS '14.
[5] Benjamin Van Roy,et al. An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..
[6] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[7] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..
[8] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[9] Benjamin Van Roy,et al. Learning to Optimize via Information-Directed Sampling , 2014, NIPS.
[10] Marc Lelarge,et al. Leveraging Side Observations in Stochastic Bandits , 2012, UAI.
[11] Fang Liu,et al. Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks , 2017, J. Mach. Learn. Res..
[12] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[13] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[14] Christos Dimitrakakis,et al. Thompson Sampling for Stochastic Bandits with Graph Feedback , 2017, AAAI.
[15] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[16] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[17] Éva Tardos,et al. Small-loss bounds for online learning with partial information , 2017, COLT.
[18] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[19] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[20] Jianping Pan,et al. Problem-dependent Regret Bounds for Online Learning with Feedback Graphs , 2019, UAI.
[21] Shie Mannor,et al. From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.
[22] Rémi Munos,et al. Efficient learning by implicit exploration in bandit problems with side observations , 2014, NIPS.
[23] Noga Alon,et al. Online Learning with Feedback Graphs: Beyond Bandits , 2015, COLT.
[24] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.
[25] Noga Alon,et al. Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback , 2014, SIAM J. Comput..
[26] Michal Valko,et al. Online Learning with Noisy Side Observations , 2016, AISTATS.
[27] Tamir Hazan,et al. Online Learning with Feedback Graphs Without the Graphs , 2016, ICML 2016.
[28] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[29] Nicolò Cesa-Bianchi,et al. Bandits With Heavy Tail , 2012, IEEE Transactions on Information Theory.
[30] Fang Liu,et al. Information Directed Sampling for Stochastic Bandits with Graph Feedback , 2017, AAAI.