Adaptive Sampling for Best Policy Identification in Markov Decision Processes
暂无分享,去创建一个
[1] Wouter M. Koolen,et al. Mixture Martingales Revisited with Applications to Sequential Tests and Confidence Intervals , 2018, J. Mach. Learn. Res..
[2] Hilbert J. Kappen,et al. On the Sample Complexity of Reinforcement Learning with a Generative Model , 2012, ICML.
[3] Aurélien Garivier,et al. Optimal Best Arm Identification with Fixed Confidence , 2016, COLT.
[4] Aurélien Garivier,et al. Non-Asymptotic Sequential Tests for Overlapping Hypotheses and application to near optimal arm identification in bandit models , 2019 .
[5] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[6] Yuxin Chen,et al. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model , 2020, NeurIPS.
[7] R. Reiss. Approximate Distributions of Order Statistics: With Applications to Nonparametric Statistics , 1989 .
[8] Anders Jonsson,et al. Planning in Markov Decision Processes with Gap-Dependent Sample Complexity , 2020, NeurIPS.
[9] Sham M. Kakade,et al. On the sample complexity of reinforcement learning. , 2003 .
[10] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[11] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..
[12] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.
[13] Mykel J. Kochenderfer,et al. Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model , 2019, NeurIPS.
[14] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[15] H. Chernoff. Sequential Design of Experiments , 1959 .
[16] Lin F. Yang,et al. Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal , 2019, COLT 2020.