Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem
暂无分享,去创建一个
Naomi Ehrich Leonard | Udari Madhushani | D. H. S. Maithripala | J. Berg | J. Wijayakulasooriya | T. Madhushani
[1] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[2] Aditya Gopalan,et al. Collaborative learning of stochastic bandits over a social network , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[3] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .
[4] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[5] S. Levin,et al. Maintaining cooperation in social-ecological systems: , 2017, Theoretical Ecology.
[6] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[7] Vaibhav Srivastava,et al. Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).
[8] Vaibhav Srivastava,et al. Social Imitation in Cooperative Multiarmed Bandits: Partition-Based Algorithms with Strictly Local Information , 2018, 2018 IEEE Conference on Decision and Control (CDC).
[9] Vaibhav Srivastava,et al. On distributed cooperative decision-making in multiarmed bandits , 2015, 2016 European Control Conference (ECC).
[10] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .
[11] Aurélien Garivier,et al. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.
[12] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[13] Naumaan Nayyar,et al. Decentralized Learning for Multiplayer Multiarmed Bandits , 2014, IEEE Transactions on Information Theory.
[14] Eric Moulines,et al. On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.
[15] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[16] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .
[17] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[18] Vaibhav Srivastava,et al. Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits , 2013, Proceedings of the IEEE.