论文信息 - KL-UCB-Based Policy for Budgeted Multi-Armed Bandits with Stochastic Action Costs - 字舞流文

KL-UCB-Based Policy for Budgeted Multi-Armed Bandits with Stochastic Action Costs

Junpei Komiyama | Atsuyoshi Nakamura | Mineichi Kudo | Ryo Watanabe | Junpei Komiyama | Atsuyoshi Nakamura | Mineichi Kudo | Ryo Watanabe

[1] Jun Wang,et al. Real-Time Bidding Benchmarking with iPinYou Dataset , 2014, ArXiv.

[2] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .

[3] Archie C. Chapman,et al. ε-first policies for budget-limited multi-armed bandits , 2010, AAAI 2010.

[4] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[5] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[6] Nenghai Yu,et al. Thompson Sampling for Budgeted Multi-Armed Bandits , 2015, IJCAI.

[7] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[8] Archie C. Chapman,et al. Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits , 2012, AAAI.

[9] Akimichi Takemura,et al. An Asymptotically Optimal Bandit Algorithm for Bounded Support Models. , 2010, COLT 2010.

[10] Nicholas R. Jennings,et al. Efficient Regret Bounds for Online Bid Optimisation in Budget-Limited Sponsored Search Auctions , 2014, UAI.

[11] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .

[12] Tao Qin,et al. Multi-Armed Bandit with Budget Constraint and Variable Costs , 2013, AAAI.

[13] Anton Schwaighofer,et al. Budget Optimization for Sponsored Search: Censored Learning in MDPs , 2012, UAI.

[14] Michael Ostrovsky,et al. Reserve Prices in Internet Advertising Auctions: A Field Experiment , 2009, Journal of Political Economy.

[15] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[16] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[17] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.

[18] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[19] H. Robbins. Some aspects of the sequential design of experiments , 1952 .