暂无分享,去创建一个
Alessandro Lazaric | Simon S. Du | Matteo Pirotta | Jean Tarbouriech | Runlong Zhou | Michal Valko | A. Lazaric | S. Du | Jean Tarbouriech | Matteo Pirotta | M. Valko | Runlong Zhou
[1] Xiangyang Ji,et al. Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon , 2021, COLT.
[2] Yishay Mansour,et al. Online Convex Optimization in Adversarial Markov Decision Processes , 2019, ICML.
[3] Yishay Mansour,et al. Stochastic Shortest Path with Adversarially Changing Costs , 2021, IJCAI.
[4] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[5] Xiaoyu Chen,et al. Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP , 2019, ICLR.
[6] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[7] Haipeng Luo,et al. Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition , 2020, Annual Conference Computational Learning Theory.
[8] Yishay Mansour,et al. Minimax Regret for Stochastic Shortest Path , 2021, NeurIPS.
[9] Xiangyang Ji,et al. Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity , 2020, ICML.
[10] Dimitri P. Bertsekas,et al. Stochastic Shortest Path Problems Under Weak Conditions , 2013 .
[11] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[12] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[13] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[14] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[15] Alessandro Lazaric,et al. No-Regret Exploration in Goal-Oriented Reinforcement Learning , 2020, ICML.
[16] Gergely Neu,et al. A Unifying View of Optimism in Episodic Reinforcement Learning , 2020, NeurIPS.
[17] Xiangyang Ji,et al. Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition , 2020, NeurIPS.
[18] Haipeng Luo,et al. Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition , 2020, NeurIPS.
[19] Shie Mannor,et al. Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies , 2019, NeurIPS.
[20] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[21] Haipeng Luo,et al. Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case , 2021, ICML.
[22] Tengyu Ma,et al. Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap , 2021, COLT.
[23] Haipeng Luo,et al. Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition , 2020, ICML.
[24] Blai Bonet,et al. On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems , 2007, Math. Oper. Res..
[25] Alessandro Lazaric,et al. Improved Sample Complexity for Incremental Autonomous Exploration in MDPs , 2020, NeurIPS.
[26] Xiangyang Ji,et al. Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and Horizon-Free Bound for Linear Mixture MDP , 2021, ArXiv.
[27] Yishay Mansour,et al. Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function , 2019, NeurIPS.
[28] Dimitri P. Bertsekas,et al. On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems , 2013, Math. Oper. Res..
[29] Michal Valko,et al. UCB Momentum Q-learning: Correcting the bias without forgetting , 2021, ICML.
[30] Nan Jiang,et al. Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon , 2018, COLT.
[31] Hector Geffner,et al. Heuristic Search for Generalized Stochastic Shortest Path MDPs , 2011, ICAPS.
[32] András György,et al. The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.
[33] Peter Auer,et al. Autonomous Exploration For Navigating In MDPs , 2012, COLT.
[34] Dimitri P. Bertsekas,et al. Linear network optimization - algorithms and codes , 1991 .
[35] Alessandro Lazaric,et al. Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs , 2019, NeurIPS.
[36] Alessandro Lazaric,et al. Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model , 2021, ALT.
[37] Ruosong Wang,et al. Is Long Horizon RL More Difficult Than Short Horizon RL? , 2020, NeurIPS.
[38] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..
[39] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[40] Gautier Stauffer,et al. The Stochastic Shortest Path Problem : A polyhedral combinatorics perspective , 2017, Eur. J. Oper. Res..
[41] Haipeng Luo,et al. Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes , 2020, ICML.
[42] Haim Kaplan,et al. Near-optimal Regret Bounds for Stochastic Shortest Path , 2020, ICML.
[43] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .