Removing the Target Network from Deep Q-Networks with the Mellowmax Operator
暂无分享,去创建一个
[1] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .
[2] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[3] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[4] Vicenç Gómez,et al. A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.
[5] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[7] Richard S. Sutton,et al. True Online TD(lambda) , 2014, ICML.
[8] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.
[9] Le Song,et al. SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.
[10] Roy Fox,et al. Taming the Noise in Reinforcement Learning via Soft Updates , 2015, UAI.
[11] Kavosh Asadi,et al. An Alternative Softmax Operator for Reinforcement Learning , 2016, ICML.
[12] Richard S. Sutton,et al. Learning to Predict Independent of Span , 2015, ArXiv.