Regret bounds for restless Markov bandits
暂无分享,去创建一个
Peter Auer | Rémi Munos | Ronald Ortner | Daniil Ryabko | R. Munos | P. Auer | D. Ryabko | R. Ortner
[1] Phuong Nguyen,et al. Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning , 2013, ICML.
[2] V. Climenhaga. Markov chains and mixing times , 2013 .
[3] Ronald Ortner,et al. Online Regret Bounds for Undiscounted Continuous Reinforcement Learning , 2012, NIPS.
[4] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[5] Rémi Munos,et al. Selecting the State-Representation in Reinforcement Learning , 2011, NIPS.
[6] Mingyan Liu,et al. Adaptive learning of uncontrolled restless bandits with logarithmic regret , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[7] Aurélien Garivier,et al. Optimally Sensing a Single Channel Without Prior Information: The Tiling Algorithm and Regret Bounds , 2011, IEEE Journal of Selected Topics in Signal Processing.
[8] Vittorio Ferrari,et al. Advances in Neural Information Processing Systems 24 , 2011 .
[9] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[10] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[11] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[12] Doina Precup,et al. Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.
[13] Marcus Hutter,et al. On the Possibility of Learning in Reactive Environments with Arbitrary Dependence , 2008, Theor. Comput. Sci..
[14] Ian F. Akyildiz,et al. A survey on spectrum management in cognitive radio networks , 2008, IEEE Communications Magazine.
[15] Ronald Ortner,et al. Pseudometrics for State Aggregation in Average Reward Markov Decision Processes , 2007, ALT.
[16] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[17] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..
[18] J. van Leeuwen,et al. Theoretical Computer Science , 2003, Lecture Notes in Computer Science.
[19] Balaraman Ravindran,et al. Model Minimization in Hierarchical Reinforcement Learning , 2002, SARA.
[20] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[21] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[22] D. Aldous. Threshold limits for cover times , 1991 .
[23] David J. Aldous,et al. Lower bounds for covering times for reversible Markov chains and random walks on graphs , 1989 .
[24] P. Whittle. Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.
[25] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .
[26] M. Nair. On Chebyshev-Type Inequalities for Primes , 1982 .
[27] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[28] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .