Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning
暂无分享,去创建一个
[1] Yishay Mansour,et al. Approximate Equivalence of Markov Decision Processes , 2003, COLT.
[2] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[3] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.
[4] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[5] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.
[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[7] Doina Precup,et al. Representation Discovery for MDPs Using Bisimulation Metrics , 2015, AAAI.
[8] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[9] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[10] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[11] Samuel Gershman,et al. Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017, bioRxiv.
[12] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[13] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[14] E. M. Wright,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.
[15] Michael L. Littman,et al. Reward-predictive representations generalize across tasks in reinforcement learning , 2019, bioRxiv.
[16] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[17] M. Botvinick,et al. The hippocampus as a predictive map , 2016 .
[18] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control 3rd Edition, Volume II , 2010 .
[19] Tom Schaul,et al. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement , 2018, ICML.
[20] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[21] Lawrence Carin,et al. Linear Feature Encoding for Reinforcement Learning , 2016, NIPS.
[22] Wolfram Burgard,et al. Deep reinforcement learning with successor features for navigation across similar environments , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[23] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[24] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[25] Richard Bellman,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.
[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[27] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[28] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.
[29] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[30] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[31] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[32] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[33] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[34] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..
[35] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[36] Doina Precup,et al. Bisimulation Metrics for Continuous Markov Decision Processes , 2011, SIAM J. Comput..
[37] Marc G. Bellemare,et al. DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.
[38] M. Botvinick,et al. The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.
[39] Csaba Szepesvári,et al. Approximate Policy Iteration with Linear Action Models , 2012, AAAI.
[40] Joelle Pineau,et al. Combined Reinforcement Learning via Abstract Representations , 2018, AAAI.
[41] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[42] Erik Talvitie. Learning the Reward Function for a Misspecified Model , 2018, ICML.
[43] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[44] Romain Laroche,et al. On Value Function Representation of Long Horizon Problems , 2018, AAAI.
[45] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[46] Stefanie Tellex,et al. Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning , 2017, ArXiv.
[47] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[48] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[49] Erik Talvitie,et al. Self-Correcting Models for Model-Based Reinforcement Learning , 2016, AAAI.
[50] Lawson L. S. Wong,et al. State Abstraction as Compression in Apprenticeship Learning , 2019, AAAI.
[51] Samuel Gershman,et al. Deep Successor Reinforcement Learning , 2016, ArXiv.
[52] Kavosh Asadi,et al. Lipschitz Continuity in Model-based Reinforcement Learning , 2018, ICML.
[53] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[54] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[55] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[56] Michael L. Littman,et al. State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.