Multi-armed Bandits with Compensation
暂无分享,去创建一个
[1] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .
[2] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[3] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[4] Archie C. Chapman,et al. Knapsack Based Optimal Policies for Budget-Limited Multi-Armed Bandits , 2012, AAAI.
[5] Rémi Munos,et al. Adaptive Bandits: Towards the best history-dependent strategy , 2011, AISTATS.
[6] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[7] Yeon-Koo Che,et al. Recommender Systems as Mechanisms for Social Learning , 2018 .
[8] R. Srikant,et al. Bandits with Budgets , 2015, SIGMETRICS.
[9] Jon M. Kleinberg,et al. Incentivizing exploration , 2014, EC.
[10] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.
[11] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[12] Andrew W. Moore,et al. The Racing Algorithm: Model Selection for Lazy Learners , 1997, Artificial Intelligence Review.
[13] Kostas Bimpikis,et al. Crowdsourcing Exploration , 2018, Manag. Sci..
[14] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[15] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .
[16] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[17] Yishay Mansour,et al. Bayesian Exploration: Incentivizing Exploration in Bayesian Games , 2016, EC.
[18] Yishay Mansour,et al. Bayesian Incentive-Compatible Bandit Exploration , 2018 .
[19] Nenghai Yu,et al. Thompson Sampling for Budgeted Multi-Armed Bandits , 2015, IJCAI.
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.