A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem
暂无分享,去创建一个
[1] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[2] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[3] Naomi Ehrich Leonard,et al. Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem , 2016, 2019 18th European Control Conference (ECC).
[4] Vaibhav Srivastava,et al. Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).
[5] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[6] Naumaan Nayyar,et al. Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.
[7] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[8] Vaibhav Srivastava,et al. Social Imitation in Cooperative Multiarmed Bandits: Partition-Based Algorithms with Strictly Local Information , 2018, 2018 IEEE Conference on Decision and Control (CDC).
[9] Aditya Gopalan,et al. Collaborative learning of stochastic bandits over a social network , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[10] Vaibhav Srivastava,et al. Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits , 2013, Proceedings of the IEEE.
[11] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[12] Vaibhav Srivastava,et al. On distributed cooperative decision-making in multiarmed bandits , 2015, 2016 European Control Conference (ECC).
[13] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .
[14] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[15] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .