暂无分享,去创建一个
[1] Sébastien Bubeck,et al. First-Order Regret Analysis of Thompson Sampling , 2019, ArXiv.
[2] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[3] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[4] G. Lugosi,et al. On Prediction of Individual Sequences , 1998 .
[5] Benjamin Van Roy,et al. An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..
[6] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[7] Marcus Hutter,et al. Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet , 2003, J. Mach. Learn. Res..
[8] Eli Upfal,et al. Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .
[9] G. Lugosi,et al. On Prediction of Individual Sequences , 1998 .
[10] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[11] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[12] Yoav Freund,et al. Predicting a binary sequence almost as well as the optimal biased coin , 2003, COLT '96.
[13] Stéphane Gaïffas,et al. On the optimality of the Hedge algorithm in the stochastic regime , 2018, J. Mach. Learn. Res..
[14] Aleksandrs Slivkins,et al. One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.
[15] Neri Merhav,et al. Universal Prediction , 1998, IEEE Trans. Inf. Theory.
[16] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[17] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[18] Aditya Gopalan. Thompson Sampling for Online Learning with Linear Experts , 2013, ArXiv.
[19] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[20] Yuval Peres,et al. Towards Optimal Algorithms for Prediction with Expert Advice , 2014, SODA.
[21] Aleksandrs Slivkins,et al. 25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .
[22] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[23] Benjamin Van Roy,et al. Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.
[24] Andrew R. Barron,et al. Asymptotic minimax regret for data compression, gambling, and prediction , 1997, IEEE Trans. Inf. Theory.
[25] Thomas M. Cover,et al. Behavior of sequential predictors of binary sequences , 1965 .
[26] Long Tran-Thanh,et al. Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.
[27] Aleksandrs Slivkins,et al. Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..
[28] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[29] Shipra Agrawal,et al. Near-Optimal Regret Bounds for Thompson Sampling , 2017, J. ACM.
[30] Karthik Sridharan,et al. Statistical Learning and Sequential Prediction , 2014 .
[31] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[32] Neri Merhav,et al. Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.
[33] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[34] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[35] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[36] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.