Asymptotically optimal algorithms for budgeted multiple play bandits
暂无分享,去创建一个
[1] Nikhil R. Devanur,et al. Bandits with concave rewards and convex knapsacks , 2014, EC.
[2] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[3] Nenghai Yu,et al. Thompson Sampling for Budgeted Multi-Armed Bandits , 2015, IJCAI.
[4] Nenghai Yu,et al. Budgeted Multi-Armed Bandits with Multiple Plays , 2016, IJCAI.
[5] Gábor Lugosi,et al. Minimax Policies for Combinatorial Prediction Games , 2011, COLT.
[6] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[7] Hiroshi Nakagawa,et al. Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays , 2015, ICML.
[8] Aurélien Garivier,et al. Explore First, Exploit Next: The True Shape of Regret in Bandit Problems , 2016, Math. Oper. Res..
[9] Yingce Xia,et al. Infinitely Many-Armed Bandits with Budget Constraints , 2016, AAAI.
[10] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .
[11] Branislav Kveton,et al. Efficient Learning in Large-Scale Combinatorial Semi-Bandits , 2014, ICML.
[12] Zheng Wen,et al. Matroid Bandits: Fast Combinatorial Optimization with Learning , 2014, UAI.
[13] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.
[14] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[15] Antoine Chambaz,et al. Asymptotically Optimal Algorithms for Multiple Play Bandits with Partial Feedback , 2016, ArXiv.
[16] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[17] Alexandre Proutière,et al. Learning to Rank , 2015, SIGMETRICS.
[18] Archie C. Chapman,et al. Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits , 2012, AAAI.
[19] Richard M. Karp,et al. Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.
[20] Wei Chen,et al. Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.
[21] Aleksandrs Slivkins,et al. Bandits with Knapsacks , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
[22] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .
[23] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[24] G. Dantzig. Discrete-Variable Extremum Problems , 1957 .
[25] Rémi Munos,et al. Thompson Sampling for 1-Dimensional Exponential Family Bandits , 2013, NIPS.
[26] Nenghai Yu,et al. Budgeted Bandit Problems with Continuous Random Costs , 2015, ACML.
[27] Aleksandrs Slivkins,et al. Combinatorial Semi-Bandits with Knapsacks , 2017, AISTATS.
[28] Zheng Wen,et al. Combinatorial Cascading Bandits , 2015, NIPS.
[29] A. Appendix. Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays , 2015 .
[30] Alexandre Proutière,et al. Combinatorial Bandits Revisited , 2015, NIPS.
[31] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .
[32] Zheng Wen,et al. Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.
[33] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[34] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[35] Long Tran-Thanh. Budget-limited multi-armed bandits , 2012 .
[36] R Core Team,et al. R: A language and environment for statistical computing. , 2014 .
[37] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[38] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.