As Expected ? An Analysis of Distributional Reinforcement Learning
暂无分享,去创建一个
Distributional reinforcement learning, in which an agent predicts distributions of returns instead of their expected values, has seen empirical success in several Atari 2600 games, outperforming both the human baseline and previously state-ofthe-art algorithms. It remains unclear precisely what drives this improvement in performance over traditional reinforcement learning approaches. In this paper, we take initial steps towards answering this question by determining under what conditions the distributional perspective leads to behaviour different from what one would see in the expected case, and conversely when they are equivalent. We supplement the theoretical findings presented in this paper with empirical results in tabular settings.
[1] Marc G. Bellemare,et al. The Cramer Distance as a Solution to Biased Wasserstein Gradients , 2017, ArXiv.