Multi-armed bandit problem with known trend
暂无分享,去创建一个
[1] Jonathan L. Shapiro,et al. Thompson Sampling in Switching Environments with Bayesian Online Change Point Detection , 2013, AISTATS 2013.
[2] Jonathan L. Shapiro,et al. Thompson Sampling in Switching Environments with Bayesian Online Change Detection , 2013, AISTATS.
[3] Aurélien Garivier,et al. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.
[4] Michèle Sebag,et al. Multi-armed Bandit, Dynamic Environments and Meta-Bandits , 2006 .
[5] Jason L. Loeppky,et al. Improving Online Marketing Experiments with Drifting Multi-armed Bandits , 2015, ICEIS.
[6] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[7] Raphaël Féraud,et al. A Neural Networks Committee for the Contextual Bandit Problem , 2014, ICONIP.
[8] Varun Kanade,et al. Sleeping Experts and Bandits with Stochastic Action Availability and Adversarial Rewards , 2009, AISTATS.
[9] P. Whittle. Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.
[10] Mitsunori Ogihara,et al. NextOne Player: A Music Recommendation System Based on User Behavior , 2011, ISMIR.
[11] J. Niño-Mora. RESTLESS BANDITS, PARTIAL CONSERVATION LAWS AND INDEXABILITY , 2001 .
[12] Romain Laroche,et al. Contextual Bandit for Active Learning: Active Thompson Sampling , 2014, ICONIP.
[13] Pushmeet Kohli,et al. On user behaviour adaptation under interface change , 2014, IUI.
[14] Filip Radlinski,et al. Mortal Multi-Armed Bandits , 2008, NIPS.
[15] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .
[16] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.