论文信息 - Statistics and Samples in Distributional Reinforcement Learning - 字舞流文

Statistics and Samples in Distributional Reinforcement Learning

We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution. Our key insight is that DRL algorithms can be decomposed as the combination of some statistical estimator and a method for imputing a return distribution consistent with that set of statistics. With this new understanding, we are able to provide improved analyses of existing DRL algorithms as well as construct a new algorithm (EDRL) based upon estimation of the expectiles of the return distribution. We compare EDRL with existing methods on a variety of MDPs to illustrate concrete aspects of our analysis, and develop a deep RL variant of the algorithm, ER-DQN, which we evaluate on the Atari-57 suite of games.

Marc G. Bellemare | Rémi Munos | Saurabh Kumar | Mark Rowland | Will Dabney | Robert Dadashi | R. Munos | Mark Rowland | Robert Dadashi | Saurabh Kumar | Will Dabney | M. Rowland

[1] Martin Engert. Finite dimensional translation invariant subspaces. , 1970 .

[2] M. J. Sobel. The variance of discounted Markov decision processes , 1982 .

[3] W. Newey,et al. Asymmetric Least Squares Estimation and Testing , 1987 .

[4] Eric Jones,et al. SciPy: Open Source Scientific Tools for Python , 2001 .

[5] B. Ripley,et al. Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] Masashi Sugiyama,et al. Nonparametric Return Distribution Approximation for Reinforcement Learning , 2010, ICML.

[8] Masashi Sugiyama,et al. Parametric Return Density Estimation for Reinforcement Learning , 2010, UAI.

[9] P. Schrimpf,et al. Dynamic Programming , 2011 .

[10] Tor Lattimore,et al. PAC Bounds for Discounted MDPs , 2012, ALT.

[11] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[12] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[13] Yee Whye Teh,et al. An Analysis of Categorical Distributional Reinforcement Learning , 2018, AISTATS.

[14] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[15] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.

[16] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.

[17] Marc G. Bellemare,et al. A Comparative Analysis of Expected and Distributional Reinforcement Learning , 2019, AAAI.

[18] Shie Mannor,et al. Nonlinear Distributional Gradient Temporal-Difference Learning , 2018, ICML.

[19] Nicolas Le Roux,et al. Distributional reinforcement learning with linear function approximation , 2019, AISTATS.

[20] Bo Liu,et al. QUOTA: The Quantile Option Architecture for Reinforcement Learning , 2018, AAAI.