Stochastic Shortest Path with Adversarially Changing Costs
暂无分享,去创建一个
[1] Haipeng Luo,et al. Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition , 2020, Annual Conference Computational Learning Theory.
[2] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[3] Shie Mannor,et al. Optimistic Policy Optimization with Bandit Feedback , 2020, ICML.
[4] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .
[5] Shie Mannor,et al. Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies , 2019, NeurIPS.
[6] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2008, Math. Oper. Res..
[7] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.
[8] Yishay Mansour,et al. Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function , 2019, NeurIPS.
[9] M. D. Wilkinson,et al. Management science , 1989, British Dental Journal.
[10] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[11] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[12] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[13] Haipeng Luo,et al. Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition , 2020, ICML.
[14] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[15] Haipeng Luo,et al. Learning Adversarial MDPs with Bandit Feedback and Unknown Transition , 2019, ArXiv.
[16] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..
[17] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[18] András György,et al. The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.
[19] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[20] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[21] A. S. Manne. Linear Programming and Sequential Decisions , 1960 .
[22] Haim Kaplan,et al. Near-optimal Regret Bounds for Stochastic Shortest Path , 2020, ICML.
[23] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[24] Alessandro Lazaric,et al. No-Regret Exploration in Goal-Oriented Reinforcement Learning , 2020, ICML.
[25] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[26] Robert M Thrall,et al. Mathematics of Operations Research. , 1978 .
[27] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[28] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[29] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.
[30] Yishay Mansour,et al. Online Convex Optimization in Adversarial Markov Decision Processes , 2019, ICML.
[31] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.