暂无分享,去创建一个
[1] D. Freedman. On Tail Probabilities for Martingales , 1975 .
[2] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[3] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[4] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[5] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.
[6] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[7] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[8] Sham M. Kakade,et al. Variance Reduction Methods for Sublinear Reinforcement Learning , 2018, ArXiv.
[9] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[10] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[11] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[12] Xiangyang Ji,et al. Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function , 2019, NeurIPS.
[13] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.
[14] Yu Bai,et al. Provably Efficient Q-Learning with Low Switching Cost , 2019, NeurIPS.
[15] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[16] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.
[17] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[18] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[19] Xiaoyu Chen,et al. Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP , 2019, ICLR.
[20] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.
[21] Yi Ouyang,et al. Learning Unknown Markov Decision Processes: A Thompson Sampling Approach , 2017, NIPS.
[22] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[23] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[24] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[25] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[26] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.