Decentralized learning for multi-player multi-armed bandits
暂无分享,去创建一个
[1] Wenhan Dai,et al. Efficient online learning for opportunistic spectrum access , 2012, 2012 Proceedings IEEE INFOCOM.
[2] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[3] Mingyan Liu,et al. Online learning in opportunistic spectrum access: A restless bandit approach , 2010, 2011 Proceedings IEEE INFOCOM.
[4] Mingyan Liu,et al. Online Learning of Rested and Restless Bandits , 2011, IEEE Transactions on Information Theory.
[5] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[6] Qing Zhao,et al. Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.
[7] D. Bertsekas. The auction algorithm: A distributed relaxation method for the assignment problem , 1988 .
[8] P. Lezaud. Chernoff-type bound for finite Markov chains , 1998 .
[9] Dimitri P. Bertsekas,et al. Auction algorithms for network flow problems: A tutorial introduction , 1992, Comput. Optim. Appl..
[10] George J. Pappas,et al. A distributed auction algorithm for the assignment problem , 2008, 2008 47th IEEE Conference on Decision and Control.
[11] Bhaskar Krishnamachari,et al. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.
[12] Qing Zhao,et al. Multi-armed bandit problems with heavy-tailed reward distributions , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[13] Mingyan Liu,et al. On the Combinatorial Multi-Armed Bandit Problem with Markovian Rewards , 2011, 2011 IEEE Global Telecommunications Conference - GLOBECOM 2011.
[14] Mingyan Liu,et al. Online algorithms for the multi-armed bandit problem with Markovian rewards , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[15] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[16] Vijay K. Bhargava,et al. Cognitive Wireless Communication Networks , 2007 .
[17] Qing Zhao,et al. Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics , 2010, IEEE Transactions on Information Theory.
[18] Yi Gai,et al. Decentralized Online Learning Algorithms for Opportunistic Spectrum Access , 2011, 2011 IEEE Global Telecommunications Conference - GLOBECOM 2011.
[19] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .
[20] J. Lamperti. ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .
[21] Wenhan Dai,et al. The non-Bayesian restless multi-armed bandit: A case of near-logarithmic regret , 2010, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] John N. Tsitsiklis,et al. The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..
[23] Ananthram Swami,et al. Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.