Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning
暂无分享,去创建一个
[1] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[2] Kenji Doya,et al. From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning , 2016, Neural Networks.
[3] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[4] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[5] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[6] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[7] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[8] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
[9] Wojciech Jaskowski,et al. High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: a Case Study in SZ-Tetris , 2015, GECCO.
[10] Kenji Doya,et al. Expected energy-based restricted Boltzmann machine for classification , 2015, Neural Networks.
[11] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[12] Matthieu Geist,et al. Approximate modified policy iteration and its application to the game of Tetris , 2015, J. Mach. Learn. Res..
[13] Friedhelm Schwenker,et al. Neural Network Ensembles in Reinforcement Learning , 2013, Neural Processing Letters.
[14] Bruno Scherrer,et al. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris , 2013, NIPS.
[15] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[16] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[17] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[18] Bruno Scherrer,et al. Improvements on Learning Tetris with Cross Entropy , 2009, J. Int. Comput. Games Assoc..
[19] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[20] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[21] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 2004, Machine Learning.
[22] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.
[23] Richard Hans Robert Hahnloser,et al. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit , 2000, Nature.
[24] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[25] Heidi Burgiel,et al. How to lose at Tetris , 1997, The Mathematical Gazette.
[26] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[27] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[28] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[29] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[30] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[31] David Haussler,et al. Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.
[32] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .