论文信息 - UvA-DARE (Digital Academic Repository) Stochastic Activation Actor Critic Methods

UvA-DARE (Digital Academic Repository) Stochastic Activation Actor Critic Methods

. Stochastic elements in reinforcement learning (RL) have shown promise to improve exploration and handling of uncertainty, such as the utilization of stochastic weights in NoisyNets and stochastic policies in the maximum entropy RL frameworks. Yet eﬀective and general approaches to include such elements in actor-critic models are still lacking. Inspired by the aforementioned techniques, we propose an eﬀective way to inject randomness into actor-critic models to improve general exploratory behavior and reﬂect environment uncertainty. Speciﬁcally, randomness is added at the level of intermediate activations that feed into both policy and value functions to achieve better correlated and more complex perturbations. The proposed framework also features ﬂexibility and sim-plicity, which allows straightforward adaptation to a variety of tasks. We test several actor-critic models enhanced with stochastic activations and demonstrate their eﬀectiveness in a wide range of Atari 2600 games, a continuous control problem and a car racing task. Lastly, in a qualitative analysis, we present evidence of the proposed model adapting the noise in the policy and value functions to reﬂect uncertainty and ambiguity in the environment.

C. Robardet | A. Knobbe

[1] Michael I. Jordan,et al. Decision-Making with Auto-Encoding Variational Bayes , 2020, NeurIPS.

[2] Yuandong Tian,et al. Latent forward model for Real-time Strategy game planning with incomplete information , 2018 .

[3] R. Adamski,et al. Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes , 2018, ISC.

[4] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[5] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[6] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[7] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[8] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.

[9] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.

[10] Max Welling,et al. Bayesian Compression for Deep Learning , 2017, NIPS.

[11] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..