Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
暂无分享,去创建一个
[1] Haipeng Luo,et al. Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case , 2021, ICML.
[2] Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications , 2021, COLT.
[3] Haipeng Luo,et al. Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition , 2020, ICML.
[4] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[5] Yishay Mansour,et al. Adversarial Stochastic Shortest Path , 2020, ArXiv.
[6] Haipeng Luo,et al. Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs , 2020, NeurIPS.
[7] Haim Kaplan,et al. Near-optimal Regret Bounds for Stochastic Shortest Path , 2020, ICML.
[8] Shie Mannor,et al. Optimistic Policy Optimization with Bandit Feedback , 2020, ICML.
[9] Haipeng Luo,et al. A Closer Look at Small-loss Bounds for Bandits with Graph Feedback , 2020, COLT.
[10] Alessandro Lazaric,et al. No-Regret Exploration in Goal-Oriented Reinforcement Learning , 2019, ICML.
[11] Yishay Mansour,et al. Online Convex Optimization in Adversarial Markov Decision Processes , 2019, ICML.
[12] Wojciech Kotlowski,et al. Bandit Principal Component Analysis , 2019, COLT.
[13] Haipeng Luo,et al. Improved Path-length Regret Bounds for Bandits , 2019, COLT.
[14] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[15] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[16] Haipeng Luo,et al. Efficient Online Portfolio with Logarithmic Regret , 2018, NeurIPS.
[17] Francesco Orabona,et al. Black-Box Reductions for Parameter-free Online Learning in Banach Spaces , 2018, COLT.
[18] Haipeng Luo,et al. More Adaptive Algorithms for Adversarial Bandits , 2018, COLT.
[19] Mehryar Mohri,et al. Parameter-Free Online Learning via Model Selection , 2017, NIPS.
[20] Nikhil R. Devanur,et al. Online Auctions and Multi-scale Online Learning , 2017, EC.
[21] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[22] Haipeng Luo,et al. Corralling a Band of Bandit Algorithms , 2016, COLT.
[23] Éva Tardos,et al. Learning in Games: Robustness of Fast Convergence , 2016, NIPS.
[24] Tor Lattimore,et al. Refined Lower Bounds for Adversarial Bandits , 2016, NIPS.
[25] Percy Liang,et al. Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm , 2014, ICML.
[26] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.
[27] Dimitri P. Bertsekas,et al. Stochastic Shortest Path Problems Under Weak Conditions , 2013 .
[28] András György,et al. The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.
[29] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.
[30] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.
[31] John N. Tsitsiklis,et al. An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..