Continuous Time Associative Bandit Problems
暂无分享,去创建一个
András György | Csaba Szepesvári | Levente Kocsis | Ivett Szabó | Csaba Szepesvari | A. György | Levente Kocsis | I. Szabó
[1] Gábor Lugosi,et al. Minimizing regret with label efficient prediction , 2004, IEEE Transactions on Information Theory.
[2] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[3] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[4] E. Ordentlich,et al. Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .
[5] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[8] A. Mandelbaum,et al. Multi-armed bandits in discrete and continuous time , 1998 .
[9] Csaba Szepesvári,et al. A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.
[10] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[11] I. Karatzas,et al. 1 SYNCHRONIZATION AND OPTIMALITY FOR MULTI-ARMED BANDIT PROBLEMS IN CONTINUOUS TIME , 1996 .