暂无分享,去创建一个
Emilie Kaufmann | Dorian Baudry | Odalric-Ambrym Maillard | Odalric-Ambrym Maillard | E. Kaufmann | Dorian Baudry
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] J. Halton,et al. Algorithm 247: Radical-inverse quasi-random point sequence , 1964, CACM.
[3] I. Sobol. On the distribution of points in a cube and the approximate evaluation of integrals , 1967 .
[4] S. T. Buckland,et al. An Introduction to the Bootstrap. , 1994 .
[5] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[6] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .
[7] Robert F. Tichy,et al. Sequences, Discrepancies and Applications , 1997 .
[8] Jonathan A. Tawn,et al. Modelling Dependence within Joint Tail Regions , 1997 .
[9] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[10] Karol Pak,et al. Stirling Numbers of the Second Kind , 2005 .
[11] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .
[12] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[13] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[14] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[15] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.
[16] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[17] Rémi Munos,et al. Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.
[18] Shie Mannor,et al. Sub-sampling for Multi-armed Bandits , 2014, ECML/PKDD.
[19] Akimichi Takemura,et al. Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards , 2015, J. Mach. Learn. Res..
[20] Benjamin Van Roy,et al. Bootstrapped Thompson Sampling and Deep Exploration , 2015, ArXiv.
[21] Brian F. Hutton,et al. What is the distribution of the number of unique original items in a bootstrap sample , 2016, 1602.05822.
[22] Tor Lattimore,et al. Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits , 2018, ICML.
[23] Junya Honda,et al. Bandit Algorithms Based on Thompson Sampling for Bounded Reward Distributions , 2020, ALT.
[24] H. Chan. The multi-armed bandit problem: An efficient nonparametric solution , 2020 .
[25] Yang Yu,et al. Residual Bootstrap Exploration for Bandit Algorithms , 2020, ArXiv.
[26] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[27] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .