暂无分享,去创建一个
[1] Akimichi Takemura,et al. An Asymptotically Optimal Bandit Algorithm for Bounded Support Models. , 2010, COLT 2010.
[2] Michael N. Katehakis,et al. An Asymptotically Optimal UCB Policy for Uniform Bandits of Unknown Support , 2015, ArXiv.
[3] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .
[4] Emilie Kaufmann,et al. Analysis of bayesian and frequentist strategies for sequential resource allocation. (Analyse de stratégies bayésiennes et fréquentistes pour l'allocation séquentielle de ressources) , 2014 .
[5] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[6] Michael Z. Zgurovsky,et al. Convergence of value iterations for total-cost MDPs and POMDPs with general state and action sets , 2014, 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[7] M. Katehakis,et al. Simple Policies with (a.s.) Arbitrarily Slow Growing Regret for Sequential Allocation Problems , 2015 .
[8] Lihong Li,et al. On Minimax Optimal Offline Policy Evaluation , 2014, ArXiv.
[9] Aurélien Garivier,et al. Optimism in Reinforcement Learning Based on Kullback-Leibler Divergence , 2010, ArXiv.
[10] Benjamin Van Roy,et al. Near-optimal Reinforcement Learning in Factored MDPs , 2014, NIPS.
[11] Warren B. Powell,et al. Asymptotically optimal Bayesian sequential change detection and identification rules , 2013, Ann. Oper. Res..
[12] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[13] A. Burnetas,et al. ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM , 2003, Probability in the Engineering and Informational Sciences.
[14] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.
[15] H Robbins,et al. Sequential choice from several populations. , 1995, Proceedings of the National Academy of Sciences of the United States of America.
[16] Michael N. Katehakis,et al. The Multi-Armed Bandit Problem: Decomposition and Computation , 1987, Math. Oper. Res..
[17] Akimichi Takemura,et al. Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors , 2013, AISTATS.
[18] Ambuj Tewari,et al. Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs , 2007, NIPS.
[19] M. Katehakis,et al. MULTI-ARMED BANDITS UNDER GENERAL DEPRECIATION AND COMMITMENT , 2014, Probability in the Engineering and Informational Sciences.
[20] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[21] Aleksandrs Slivkins,et al. 25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .
[22] R. Weber. On the Gittins Index for Multiarmed Bandits , 1992 .
[23] Apostolos Burnetas,et al. On Sequencing Two Types of Tasks on a Single Processor under Incomplete Information , 1993, Probability in the Engineering and Informational Sciences.
[24] Akimichi Takemura,et al. An asymptotically optimal policy for finite support models in the multiarmed bandit problem , 2009, Machine Learning.
[25] Michael N. Katehakis,et al. Asymptotic Behavior of Minimal-Exploration Allocation Policies: Almost Sure, Arbitrarily Slow Growing Regret , 2015, ArXiv.
[26] Michael L. Littman,et al. Inducing Partially Observable Markov Decision Processes , 2012, ICGI.
[27] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[28] A. Burnetas,et al. Dynamic allocation policies for the finite horizon one armed bandit problem , 1998 .
[29] Panos M. Pardalos,et al. Cooperative Control: Models, Applications, and Algorithms , 2003 .
[30] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[31] Mingyan Liu,et al. Approximately optimal adaptive learning in opportunistic spectrum access , 2012, 2012 Proceedings IEEE INFOCOM.
[32] Michael N. Katehakis,et al. COMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS , 1986 .
[33] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..
[34] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[35] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[36] Uriel G. Rothblum,et al. The multi-armed bandit, with constraints , 2012, Annals of Operations Research.
[37] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[38] Wassim Jouini,et al. Multi-armed bandit based policies for cognitive radio's decision making issues , 2009, 2009 3rd International Conference on Signals, Circuits and Systems (SCS).
[39] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[40] Apostolos Burnetas,et al. On large deviations properties of sequential allocation problems , 1996 .