Meta-Gradient Reinforcement Learning
暂无分享,去创建一个
David Silver | Zhongwen Xu | Hado van Hasselt | D. Silver | H. V. Hasselt | Zhongwen Xu | David Silver
[1] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[2] Richard S. Sutton,et al. Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.
[3] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[6] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[7] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .
[8] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[9] Michael Kearns,et al. Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.
[10] Sepp Hochreiter,et al. Learning to Learn Using Gradient Descent , 2001, ICANN.
[11] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.
[12] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[13] Peter Dayan,et al. Analytical Mean Squared Error Curves for Temporal Difference Learning , 1996, Machine Learning.
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[16] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[17] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[18] Scott Sanner,et al. Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda , 2010, ICML.
[19] P. Thomas,et al. TD γ : Re-evaluating Complex Backups in Temporal Difference Learning , 2011 .
[20] Scott Niekum,et al. TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning , 2011, NIPS.
[21] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[22] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.
[23] R. Sutton,et al. A new Q ( � ) with interim forward view and Monte Carlo equivalence , 2014 .
[24] Doina Precup,et al. A new Q(lambda) with interim forward view and Monte Carlo equivalence , 2014, ICML.
[25] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[27] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.
[28] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[29] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.
[30] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[31] Marc G. Bellemare,et al. Q(λ) with Off-Policy Corrections , 2016, ALT.
[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[34] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.
[35] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[36] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.
[37] Martha White,et al. A Greedy Approach to Adapting the Trace Parameter for Temporal Difference Learning , 2016, AAMAS.
[38] Fabian Pedregosa,et al. Hyperparameter optimization with approximate gradient , 2016, ICML.
[39] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[40] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.
[41] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.
[42] Martha White,et al. An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning , 2015, J. Mach. Learn. Res..
[43] Marc G. Bellemare,et al. Safe and Efficient Off-Policy Reinforcement Learning , 2016, NIPS.
[44] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.
[45] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.
[46] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.
[47] Ashique Mahmood,et al. Incremental Off-policy Reinforcement Learning Algorithms , 2017 .
[48] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[49] Misha Denil,et al. Learned Optimizers that Scale and Generalize , 2017, ICML.
[50] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.
[51] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[52] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.
[53] Paolo Frasconi,et al. Forward and Reverse Gradient-Based Hyperparameter Optimization , 2017, ICML.
[54] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[55] Kenji Doya,et al. Online meta-learning by parallel algorithm competition , 2018, GECCO.
[56] Sergey Levine,et al. Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.
[57] Satinder Singh,et al. On Learning Intrinsic Rewards for Policy Gradient Methods , 2018, NeurIPS.
[58] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.
[59] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[60] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[61] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.