Bounded Regret for Finite-Armed Structured Bandits
暂无分享,去创建一个
[1] Umar Syed,et al. Bandits, Query Learning, and the Haystack Dimension , 2011, COLT.
[2] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[3] Sébastien Bubeck,et al. Prior-free and prior-dependent regret bounds for Thompson Sampling , 2013, 2014 48th Annual Conference on Information Sciences and Systems (CISS).
[4] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[5] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[6] T. L. Graves,et al. Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains , 1997 .
[7] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[8] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..
[9] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[10] Rémi Munos,et al. Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.
[11] Csaba Szepesvári,et al. Variance estimates and exploration function in multi-armed bandit , 2008 .
[12] Vianney Perchet,et al. Bounded regret in stochastic multi-armed bandits , 2013, COLT.
[13] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[14] John N. Tsitsiklis,et al. A Structured Multiarmed Bandit Problem and the Greedy Policy , 2008, IEEE Transactions on Automatic Control.
[15] Benjamin Van Roy,et al. Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.
[16] Alessandro Lazaric,et al. Sequential Transfer in Multi-armed Bandit with Finite Set of Models , 2013, NIPS.
[17] R. Agrawal,et al. Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space , 1989 .
[18] H. Robbins,et al. Optimal sequential sampling from two populations. , 1984, Proceedings of the National Academy of Sciences of the United States of America.
[19] Csaba Szepesvári,et al. Online Optimization in X-Armed Bandits , 2008, NIPS.