暂无分享,去创建一个
Yishay Mansour | Alon Cohen | Yonathan Efroni | Aviv Rosenberg | Aviv A. Rosenberg | Y. Mansour | Alon Cohen | Yonathan Efroni
[1] Yishay Mansour,et al. Online Convex Optimization in Adversarial Markov Decision Processes , 2019, ICML.
[2] Haipeng Luo,et al. Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition , 2020, NeurIPS.
[3] Shie Mannor,et al. Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies , 2019, NeurIPS.
[4] Shie Mannor,et al. Optimistic Policy Optimization with Bandit Feedback , 2020, ICML.
[5] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.
[6] Alessandro Lazaric,et al. Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret , 2021, NeurIPS.
[7] Xiangyang Ji,et al. Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon , 2021, COLT.
[8] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[9] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[10] Alessandro Lazaric,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[11] Yishay Mansour,et al. Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function , 2019, NeurIPS.
[12] Yishay Mansour,et al. Learning Adversarial Markov Decision Processes with Delayed Feedback , 2020, AAAI.
[13] Haim Kaplan,et al. Near-optimal Regret Bounds for Stochastic Shortest Path , 2020, ICML.
[14] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[15] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[16] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[17] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[18] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[19] Yishay Mansour,et al. Stochastic Shortest Path with Adversarially Changing Costs , 2021, IJCAI.
[20] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[21] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[22] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[23] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[24] Alessandro Lazaric,et al. No-Regret Exploration in Goal-Oriented Reinforcement Learning , 2020, ICML.
[25] Haipeng Luo,et al. Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition , 2020, Annual Conference Computational Learning Theory.
[26] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[27] Haipeng Luo,et al. Learning Adversarial MDPs with Bandit Feedback and Unknown Transition , 2019, ArXiv.
[28] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..
[29] Lihong Li,et al. Policy Certificates: Towards Accountable Reinforcement Learning , 2018, ICML.
[30] Haipeng Luo,et al. Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case , 2021, ICML.
[31] Shie Mannor,et al. Confidence-Budget Matching for Sequential Budgeted Learning , 2021, ICML.
[32] András György,et al. The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.
[33] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[34] Alessandro Lazaric,et al. Frequentist Regret Bounds for Randomized Least-Squares Value Iteration , 2020, AISTATS.
[35] Shie Mannor,et al. Reinforcement Learning with Trajectory Feedback , 2020, ArXiv.
[36] Haipeng Luo,et al. Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs , 2020, Neural Information Processing Systems.
[37] Haipeng Luo,et al. Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition , 2020, ICML.