Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning
暂无分享,去创建一个
[1] Lawrence Carin,et al. Linear Feature Encoding for Reinforcement Learning , 2016, NIPS.
[2] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[3] Yishay Mansour,et al. Approximate Equivalence of Markov Decision Processes , 2003, COLT.
[4] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.
[5] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[6] Romain Laroche,et al. On Value Function Representation of Long Horizon Problems , 2018, AAAI.
[7] Doina Precup,et al. Metrics for Finite Markov Decision Processes , 2004, AAAI.
[8] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[9] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[10] Marc G. Bellemare,et al. DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[13] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control 3rd Edition, Volume II , 2010 .
[14] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[15] Michael L. Littman,et al. Reward-predictive representations generalize across tasks in reinforcement learning , 2019, bioRxiv.
[16] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[17] M. Botvinick,et al. The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.
[18] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[19] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.
[20] Lawson L. S. Wong,et al. State Abstraction as Compression in Apprenticeship Learning , 2019, AAAI.
[21] Erik Talvitie. Learning the Reward Function for a Misspecified Model , 2018, ICML.
[22] Tom Schaul,et al. The Predictron: End-To-End Learning and Planning , 2016, ICML.
[23] Kavosh Asadi,et al. Lipschitz Continuity in Model-based Reinforcement Learning , 2018, ICML.
[24] Erik Talvitie,et al. Self-Correcting Models for Model-Based Reinforcement Learning , 2016, AAAI.
[25] Samuel Gershman,et al. Deep Successor Reinforcement Learning , 2016, ArXiv.
[26] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[27] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[28] M. Botvinick,et al. The hippocampus as a predictive map , 2016 .
[29] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..
[30] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[32] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[33] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[34] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[35] Doina Precup,et al. Bisimulation Metrics for Continuous Markov Decision Processes , 2011, SIAM J. Comput..
[36] Satinder Singh,et al. Value Prediction Network , 2017, NIPS.
[37] E. M. Wright,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.
[38] Wolfram Burgard,et al. Deep reinforcement learning with successor features for navigation across similar environments , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[39] Lihong Li,et al. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.
[40] Michael L. Littman,et al. State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.
[41] Doina Precup,et al. Representation Discovery for MDPs Using Bisimulation Metrics , 2015, AAAI.
[42] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.
[43] Benjamin Van Roy,et al. (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.
[44] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[45] Joelle Pineau,et al. Combined Reinforcement Learning via Abstract Representations , 2018, AAAI.
[46] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[47] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[48] George Konidaris,et al. Value Function Approximation in Reinforcement Learning Using the Fourier Basis , 2011, AAAI.
[49] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[50] Csaba Szepesvári,et al. Approximate Policy Iteration with Linear Action Models , 2012, AAAI.
[51] Stefanie Tellex,et al. Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning , 2017, ArXiv.
[52] Demis Hassabis,et al. Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.
[53] Samuel Gershman,et al. Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017, bioRxiv.
[54] Alborz Geramifard,et al. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.
[55] Tom Schaul,et al. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement , 2018, ICML.
[56] Richard Bellman,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.