On Bayesian index policies for sequential resource allocation
暂无分享,去创建一个
[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[2] R. Bellman. A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .
[3] R. N. Bradt,et al. On Sequential Designs for Maximizing the Sum of $n$ Observations , 1956 .
[4] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[5] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .
[6] T. Lai,et al. Optimal stopping and dynamic allocation , 1987, Advances in Applied Probability.
[7] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[8] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .
[9] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[10] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[11] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[12] Murray K. Clayton,et al. Small-sample performance of Bernoulli two-armed bandit Bayesian strategies , 1999 .
[13] A. Burnetas,et al. ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM , 2003, Probability in the Engineering and Informational Sciences.
[14] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[15] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[16] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[17] Akimichi Takemura,et al. An Asymptotically Optimal Bandit Algorithm for Bounded Support Models. , 2010, COLT 2010.
[18] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..
[19] U. Rieder,et al. Markov Decision Processes , 2010 .
[20] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.
[21] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[22] José Niòo-Mora. Computing a Classic Index for Finite-Horizon Bandits , 2011 .
[23] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[24] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[25] José Niño-Mora,et al. Computing a Classic Index for Finite-Horizon Bandits , 2011, INFORMS J. Comput..
[26] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[27] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[28] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[29] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.
[30] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[31] S. Boucheron,et al. Concentration inequalities : a non asymptotic theory of independence , 2013 .
[32] Rémi Munos,et al. Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.
[33] Akimichi Takemura,et al. Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors , 2013, AISTATS.
[34] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[35] Emilie Kaufmann,et al. Analysis of bayesian and frequentist strategies for sequential resource allocation. (Analyse de stratégies bayésiennes et fréquentistes pour l'allocation séquentielle de ressources) , 2014 .
[36] Vaibhav Srivastava,et al. Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits , 2013, Proceedings of the IEEE.
[37] Sébastien Bubeck,et al. Prior-free and prior-dependent regret bounds for Thompson Sampling , 2013, 2014 48th Annual Conference on Information Sciences and Systems (CISS).
[38] Benjamin Van Roy,et al. Learning to Optimize via Information-Directed Sampling , 2014, NIPS.
[39] Lihong Li,et al. On the Prior Sensitivity of Thompson Sampling , 2015, ALT.
[40] Tor Lattimore,et al. Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits , 2015, COLT.
[41] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .