Reward-predictive representations generalize across tasks in reinforcement learning
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[2] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[5] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[6] P. Dayan,et al. A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[7] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[8] Robert Givan,et al. Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..
[9] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[10] Michael J. Frank,et al. Hippocampus, cortex, and basal ganglia: Insights from computational models of complementary learning systems , 2004, Neurobiology of Learning and Memory.
[11] Donald E. Knuth,et al. The Art of Computer Programming, Volume 4, Fascicle 2: Generating All Tuples and Permutations (Art of Computer Programming) , 2005 .
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Michael I. Jordan,et al. Hierarchical Dirichlet Processes , 2006 .
[14] Thomas J. Walsh,et al. Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.
[15] Jadin C. Jackson,et al. Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling. , 2007, Psychological review.
[16] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..
[17] Samuel J. Gershman,et al. A Tutorial on Bayesian Nonparametric Models , 2011, 1106.2697.
[18] Doina Precup,et al. Bisimulation Metrics for Continuous Markov Decision Processes , 2011, SIAM J. Comput..
[19] M. Frank,et al. Mechanisms of hierarchical reinforcement learning in cortico-striatal circuits 2: evidence from fMRI. , 2012, Cerebral cortex.
[20] M. Frank,et al. Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: computational analysis. , 2012, Cerebral cortex.
[21] Doina Precup,et al. Bisimulation Metrics are Optimal Value Functions , 2014, UAI.
[22] Robert C. Wilson,et al. Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.
[23] Anne G E Collins,et al. Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. , 2014, Psychological review.
[24] Doina Precup,et al. Basis refinement strategies for linear value function approximation in MDPs , 2015, NIPS.
[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[26] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[27] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..
[28] Samuel Gershman,et al. Deep Successor Reinforcement Learning , 2016, ArXiv.
[29] Nicolas W. Schuck,et al. Human Orbitofrontal Cortex Represents a Cognitive Map of State Space , 2016, Neuron.
[30] M. Frank,et al. Neural signature of hierarchically structured expectations predicts clustering and transfer of rule sets in reinforcement learning , 2016, Cognition.
[31] Brad E. Pfeiffer,et al. Reverse Replay of Hippocampal Place Cells Is Uniquely Modulated by Changing Reward , 2016, Neuron.
[32] Michael L. Littman,et al. Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.
[33] M. Botvinick,et al. Statistical learning of temporal community structure in the hippocampus , 2016, Hippocampus.
[34] Wolfram Burgard,et al. Deep reinforcement learning with successor features for navigation across similar environments , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[35] Joshua L. Jones,et al. Dopamine transients are sufficient and necessary for acquisition of model-based associations , 2017, Nature Neuroscience.
[36] Kimberly L. Stachenfeld,et al. The hippocampus as a predictive map , 2017, Nature Neuroscience.
[37] Raymond J Dolan,et al. A map of abstract relational knowledge in the human hippocampal–entorhinal cortex , 2017, eLife.
[38] Donna J. Calu,et al. The Dopamine Prediction Error: Contributions to Associative Models of Reward Learning , 2017, Front. Psychol..
[39] Michael J. Frank,et al. Compositional clustering in task structure learning , 2017 .
[40] Stefanie Tellex,et al. Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning , 2017, ArXiv.
[41] Samuel J. Gershman,et al. Predictive representations can link model-based reinforcement learning to model-free mechanisms , 2017 .
[42] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[43] M. Botvinick,et al. The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.
[44] Michael L. Littman,et al. State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.
[45] Marcelo G Mattar,et al. Prioritized memory access explains planning and hippocampal replay , 2017, Nature Neuroscience.
[46] Timothy Edward John Behrens,et al. Generalisation of structural knowledge in the Hippocampal-Entorhinal system , 2018, NeurIPS.
[47] Zeb Kurth-Nelson,et al. What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior , 2018, Neuron.
[48] Michael J. Frank,et al. Compositional clustering in task structure learning , 2017, bioRxiv.
[49] Tom Schaul,et al. Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement , 2018, ICML.
[50] Michael J. Frank,et al. Generalizing to generalize: Humans flexibly switch between compositional and conjunctive structures during reinforcement learning , 2019, bioRxiv.
[51] Alyssa A. Carey,et al. Reward revaluation biases hippocampal replay content away from the preferred outcome , 2019, Nature Neuroscience.
[52] Michael L. Littman,et al. Successor Features Support Model-based and Model-free Reinforcement Learning , 2019, ArXiv.
[53] Tom Schaul,et al. Universal Successor Features Approximators , 2018, ICLR.
[54] Timothy E. J. Behrens,et al. Human Replay Spontaneously Reorganizes Experience , 2019, Cell.
[55] Michael J. Frank,et al. Generalizing to generalize: when (and when not) to be compositional in task structure learning , 2019 .
[56] Joelle Pineau,et al. Combined Reinforcement Learning via Abstract Representations , 2018, AAAI.
[57] Nicolas W. Schuck,et al. Sequential replay of nonspatial task states in the human hippocampus , 2018, Science.
[58] Caswell Barry,et al. The Tolman-Eichenbaum Machine: Unifying Space and Relational Memory through Generalization in the Hippocampal Formation , 2019, Cell.
[59] Michael L. Littman,et al. Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning , 2019, J. Mach. Learn. Res..
[60] Balaraman Ravindran. Approximate Homomorphisms : A framework for non-exact minimization in Markov Decision Processes , 2022 .