The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits
暂无分享,去创建一个
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] D. Teneketzis,et al. Asymptotically Efficient Adaptive Allocation Schemes for Controlled I.I.D. Processes: Finite Paramet , 1988 .
[3] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[4] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[5] H Robbins,et al. Sequential choice from several populations. , 1995, Proceedings of the National Academy of Sciences of the United States of America.
[6] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[7] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[8] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.
[9] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[10] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[11] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[12] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[13] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[14] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[15] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[16] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[17] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[18] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[19] Csaba Szepesvári,et al. Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.
[20] Omar Rivasplata. Subgaussian random variables : An expository note , 2012 .
[21] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[22] Rémi Munos,et al. Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.
[23] Rémi Munos,et al. Spectral Bandits for Smooth Graph Functions , 2014, ICML.
[24] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[25] Alessandro Lazaric,et al. Best-Arm Identification in Linear Bandits , 2014, NIPS.
[26] Benjamin Van Roy,et al. Learning to Optimize via Information-Directed Sampling , 2014, NIPS.
[27] Shie Mannor,et al. Thompson Sampling for Learning Parameterized Markov Decision Processes , 2014, COLT.
[28] Hiroshi Nakagawa,et al. Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring , 2015, NIPS.
[29] Yifan Wu,et al. Online Learning with Gaussian Payoffs and Side Observations , 2015, NIPS.
[30] Tor Lattimore,et al. Refined Lower Bounds for Adversarial Bandits , 2016, NIPS.
[31] Gautam Reddy,et al. Infomax Strategies for an Optimal Balance Between Exploration and Exploitation , 2016, Journal of Statistical Physics.
[32] Aurélien Garivier,et al. Optimal Best Arm Identification with Fixed Confidence , 2016, COLT.
[33] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .