KL-UCB-Based Policy for Budgeted Multi-Armed Bandits with Stochastic Action Costs
暂无分享,去创建一个
Junpei Komiyama | Atsuyoshi Nakamura | Mineichi Kudo | Ryo Watanabe | Junpei Komiyama | Atsuyoshi Nakamura | Mineichi Kudo | Ryo Watanabe
[1] Jun Wang,et al. Real-Time Bidding Benchmarking with iPinYou Dataset , 2014, ArXiv.
[2] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .
[3] Archie C. Chapman,et al. ε-first policies for budget-limited multi-armed bandits , 2010, AAAI 2010.
[4] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[5] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[6] Nenghai Yu,et al. Thompson Sampling for Budgeted Multi-Armed Bandits , 2015, IJCAI.
[7] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[8] Archie C. Chapman,et al. Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits , 2012, AAAI.
[9] Akimichi Takemura,et al. An Asymptotically Optimal Bandit Algorithm for Bounded Support Models. , 2010, COLT 2010.
[10] Nicholas R. Jennings,et al. Efficient Regret Bounds for Online Bid Optimisation in Budget-Limited Sponsored Search Auctions , 2014, UAI.
[11] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[12] Tao Qin,et al. Multi-Armed Bandit with Budget Constraint and Variable Costs , 2013, AAAI.
[13] Anton Schwaighofer,et al. Budget Optimization for Sponsored Search: Censored Learning in MDPs , 2012, UAI.
[14] Michael Ostrovsky,et al. Reserve Prices in Internet Advertising Auctions: A Field Experiment , 2009, Journal of Political Economy.
[15] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[16] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[17] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.
[18] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[19] H. Robbins. Some aspects of the sequential design of experiments , 1952 .