论文信息 - AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Reinforcement Learning with Near-Optimal Sample Complexity

AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Reinforcement Learning with Near-Optimal Sample Complexity

In this paper, we propose AsyncQVI, an asynchronous-parallel Q-value iteration for Reinforcement Learning problems. Given such a problem with $|\mathcal{S}|$ states, $|\mathcal{A}|$ actions, and a discounted factor $\gamma\in(0,1)$, AsyncQVI uses memory of size $\mathcal{O}(|\mathcal{S}|)$ and returns an $\varepsilon$-optimal policy with probability at least $1-\delta$ using $$\tilde{\mathcal{O}}\bigg(\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^5\varepsilon^2}\log\Big(\frac{1}{\delta}\Big)\bigg)$$ samples. AsyncQVI is also the first asynchronous-parallel algorithm for reinforcement learning with a convergence rate and a sample complexity. Its sample complexity nearly matches the theoretical lower bound. The relatively low memory footprint and parallel ability of AsyncQVI make it suitable for large-scale applications. In numerical tests, we compare AsyncQVI with four sample-based value iteration methods. The test results show AsyncQVI is highly efficient and achieves linear parallel speedup.

Wotao Yin | Fei Feng | Yibo Zeng

[1] Wotao Yin,et al. More Iterations per Second, Same Quality - Why Asynchronous Algorithms may Drastically Outperform Traditional Ones , 2017, ArXiv.

[2] Lin F. Yang,et al. Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model , 2018, 1806.01492.

[3] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning with Linear Transition Models , 2019, ICML 2019.

[4] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[5] Xian Wu,et al. Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model , 2018, NeurIPS.

[6] Mengdi Wang,et al. Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear Running Time , 2017, ArXiv.

[7] Pieter Abbeel,et al. Asynchronous Methods for Model-Based Reinforcement Learning , 2019, CoRL.

[8] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[9] Xian Wu,et al. Variance reduced value iteration and faster algorithms for solving Markov decision processes , 2017, SODA.

[10] Wotao Yin,et al. On Unbounded Delays in Asynchronous Parallel Fixed-Point Algorithms , 2016, J. Sci. Comput..

[11] Pieter Abbeel,et al. Accelerated Methods for Deep Reinforcement Learning , 2018, ArXiv.