暂无分享,去创建一个
[1] David Haussler,et al. How to use expert advice , 1993, STOC.
[2] Yishay Mansour,et al. Online Convex Optimization in Adversarial Markov Decision Processes , 2019, ICML.
[3] András György,et al. The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.
[4] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[5] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[6] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[7] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[8] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.
[9] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[10] Haipeng Luo,et al. Learning Adversarial MDPs with Bandit Feedback and Unknown Transition , 2019, ArXiv.