暂无分享,去创建一个
[1] Ananthram Swami,et al. Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.
[2] Joseph Mitola,et al. Cognitive radio: making software radios more personal , 1999, IEEE Wirel. Commun..
[3] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[4] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[5] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.
[6] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[7] Aurélien Garivier,et al. Explore First, Exploit Next: The True Shape of Regret in Bandit Problems , 2016, Math. Oper. Res..
[8] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[9] Roi Livni,et al. Bandits with Movement Costs and Adaptive Pricing , 2017, COLT.
[10] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .
[11] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[12] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .
[13] Ao Tang,et al. Opportunistic Spectrum Access with Multiple Users: Learning under Competition , 2010, 2010 Proceedings IEEE INFOCOM.
[14] Shie Mannor,et al. Learning to coordinate without communication in multi-user multi-armed bandit problems , 2015, ArXiv.
[15] Shie Mannor,et al. Multi-user lax communications: A multi-armed bandit approach , 2015, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.
[16] Qing Zhao,et al. Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.
[17] Rómer Rosales,et al. Simple and Scalable Response Prediction for Display Advertising , 2014, ACM Trans. Intell. Syst. Technol..
[18] Ohad Shamir,et al. Multi-player bandits: a musical chairs approach , 2016, ICML 2016.
[19] Jacques Palicot,et al. Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings , 2017, CrownCom.
[20] Naumaan Nayyar,et al. Decentralized learning for multi-player multi-armed bandits , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[21] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .
[22] Wassim Jouini,et al. Multi-armed bandit based policies for cognitive radio's decision making issues , 2009, 2009 3rd International Conference on Signals, Circuits and Systems (SCS).
[23] D. Ernst,et al. Upper Confidence Bound Based Decision Making Strategies and Dynamic Spectrum Access , 2010, 2010 IEEE International Conference on Communications.
[24] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[25] A. Appendix. Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays , 2015 .
[26] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[27] Djallel Bouneffouf,et al. Finite-time analysis of the multi-armed bandit problem with known trend , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).
[28] Mingyan Liu,et al. Online learning in decentralized multi-user spectrum access with synchronized explorations , 2012, MILCOM 2012 - 2012 IEEE Military Communications Conference.
[29] H. Robbins. Some aspects of the sequential design of experiments , 1952 .