Successor Features for Transfer in Reinforcement Learning
暂无分享,去创建一个
Tom Schaul | David Silver | Jonathan J. Hunt | Hado van Hasselt | André Barreto | Will Dabney | Rémi Munos | T. Schaul | D. Silver | R. Munos | Will Dabney | H. V. Hasselt | André Barreto | David Silver
[1] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[3] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[6] Christopher G. Atkeson,et al. A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.
[7] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.
[8] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[9] Daniel S. Bernstein,et al. Reusing Old Policies to Accelerate Learning on New MDPs , 1999 .
[10] Jonathan Baxter,et al. A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..
[11] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.
[12] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[13] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[14] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[15] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .
[16] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[17] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[18] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[20] Sriraam Natarajan,et al. Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.
[21] Manfred Huber,et al. Effective Control Knowledge Transfer through Learning Skill and Representation Hierarchies , 2007, IJCAI.
[22] Massimiliano Pontil,et al. Convex multi-task feature learning , 2008, Machine Learning.
[23] Marek Petrik,et al. An Analysis of Laplacian Methods for Value Function Approximation in MDPs , 2007, IJCAI.
[24] Lihong Li,et al. Analyzing feature generation for value-function approximation , 2007, ICML '07.
[25] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..
[26] Sriraam Natarajan,et al. Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.
[27] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[28] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..
[29] Javier García,et al. Probabilistic Policy Reuse for inter-task transfer learning , 2010, Robotics Auton. Syst..
[30] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[31] Patrick M. Pilarski,et al. Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.
[32] Alessandro Lazaric,et al. Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.
[33] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[34] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[35] Shalabh Bhatnagar,et al. Universal Option Models , 2014, NIPS.
[36] Richard S. Sutton,et al. Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..
[37] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.
[38] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[39] Peter Kulchyski. and , 2015 .
[40] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[41] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.
[42] Samuel Gershman,et al. Deep Successor Reinforcement Learning , 2016, ArXiv.
[43] Wolfram Burgard,et al. Deep reinforcement learning with successor features for navigation across similar environments , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).