论文信息 - Refined Analysis of FPL for Adversarial Markov Decision Processes

Refined Analysis of FPL for Adversarial Markov Decision Processes

We consider the adversarial Markov Decision Process (MDP) problem, where the rewards for the MDP can be adversarially chosen, and the transition function can be either known or unknown. In both settings, Follow-the-PerturbedLeader (FPL) based algorithms have been proposed in previous literature. However, the established regret bounds for FPL based algorithms are worse than algorithms based on mirrordescent. We improve the analysis of FPL based algorithms in both settings, matching the current best regret bounds using faster and simpler algorithms.

Yuanhao Wang | Kefan Dong | Yuanhao Wang | Kefan Dong

[1] David Haussler,et al. How to use expert advice , 1993, STOC.

[2] Yishay Mansour,et al. Online Convex Optimization in Adversarial Markov Decision Processes , 2019, ICML.

[3] András György,et al. The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.

[4] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[5] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.

[6] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..

[7] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[8] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.

[9] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[10] Haipeng Luo,et al. Learning Adversarial MDPs with Bandit Feedback and Unknown Transition , 2019, ArXiv.