论文信息 - Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning - 字舞流文

Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning

Kenji Doya | Stefan Elfwing | Eiji Uchibe | K. Doya | E. Uchibe | Stefan Elfwing

[1] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[2] Kenji Doya,et al. From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning , 2016, Neural Networks.

[3] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[4] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[5] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[6] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[7] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[8] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[9] Wojciech Jaskowski,et al. High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: a Case Study in SZ-Tetris , 2015, GECCO.

[10] Kenji Doya,et al. Expected energy-based restricted Boltzmann machine for classification , 2015, Neural Networks.

[11] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[12] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..

[13] Friedhelm Schwenker,et al. Neural Network Ensembles in Reinforcement Learning , 2013, Neural Processing Letters.

[14] Bruno Scherrer,et al. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris , 2013, NIPS.

[15] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[16] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.

[17] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Bruno Scherrer,et al. Improvements on Learning Tetris with Cross Entropy , 2009, J. Int. Comput. Games Assoc..

[19] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 2004, Machine Learning.

[22] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[23] Richard Hans Robert Hahnloser,et al. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit , 2000, Nature.

[24] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[25] Heidi Burgiel,et al. How to lose at Tetris , 1997, The Mathematical Gazette.

[26] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .

[27] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .

[28] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[29] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[30] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[31] David Haussler,et al. Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[32] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .