Online meta-learning by parallel algorithm competition
暂无分享,去创建一个
[1] Peter I. Frazier,et al. The Parallel Knowledge Gradient Method for Batch Bayesian Optimization , 2016, NIPS.
[2] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[3] Prabhat,et al. Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.
[4] Shie Mannor,et al. Adaptive Lambda Least-Squares Temporal Difference Learning , 2016, 1612.09465.
[5] Heidi Burgiel,et al. How to lose at Tetris , 1997, The Mathematical Gazette.
[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[7] Junichiro Yoshimoto,et al. Control of exploitation-exploration meta-parameter in reinforcement learning , 2002, Neural Networks.
[8] Kenji Doya,et al. Evolution of meta-parameters in reinforcement learning algorithm , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).
[9] Kenji Doya,et al. Meta-learning in Reinforcement Learning , 2003, Neural Networks.
[10] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[11] James E. Baker,et al. Reducing Bias and Inefficienry in the Selection Algorithm , 1987, ICGA.
[12] Damien Ernst,et al. How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies , 2015, ArXiv.
[13] Aaron Klein,et al. Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.
[14] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[15] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[16] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
[17] Kunikazu Kobayashi,et al. A Meta-learning Method Based on Temporal Difference Error , 2009, ICONIP.
[18] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[19] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[20] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[21] Henrik I. Christensen,et al. Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning , 2008, Adapt. Behav..
[22] Kevin Leyton-Brown,et al. Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.
[23] Henrik I. Christensen,et al. Darwinian embodied evolution of the learning ability for survival , 2011, Adapt. Behav..
[24] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[25] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[27] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[28] Bruno Scherrer,et al. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris , 2013, NIPS.
[29] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[30] Friedhelm Schwenker,et al. Neural Network Ensembles in Reinforcement Learning , 2013, Neural Processing Letters.
[31] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.
[32] Kenji Doya,et al. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning , 2017, Neural Networks.
[33] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[34] Scott Sanner,et al. Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda , 2010, ICML.
[35] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .