Predictive representations can link model-based reinforcement learning to model-free mechanisms
暂无分享,去创建一个
[1] F. Ciancia. Tolman and Honzik (1930) revisited or the mazes of psychology (1930-1980) , 1991 .
[2] Samuel Gershman,et al. Design Principles of the Hippocampal Cognitive Map , 2014, NIPS.
[3] Amir Dezfouli,et al. Speed/Accuracy Trade-Off between the Habitual and the Goal-Directed Processes , 2011, PLoS Comput. Biol..
[4] G. E. Alexander,et al. Functional architecture of basal ganglia circuits: neural substrates of parallel processing , 1990, Trends in Neurosciences.
[5] P. Glimcher,et al. Value Representations in the Primate Striatum during Matching Behavior , 2008, Neuron.
[6] Colin Camerer,et al. Experience‐weighted Attraction Learning in Normal Form Games , 1999 .
[7] Rajesh P. N. Rao,et al. Spike-Timing-Dependent Hebbian Plasticity as Temporal Difference Learning , 2001, Neural Computation.
[8] James L. McClelland,et al. Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.
[9] A. Faure,et al. Lesion to the Nigrostriatal Dopamine System Disrupts Stimulus-Response Habit Formation , 2005, The Journal of Neuroscience.
[10] Shantanu P. Jadhav,et al. Interplay between Hippocampal Sharp-Wave-Ripple Events and Vicarious Trial and Error Behaviors in Decision Making , 2016, Neuron.
[11] Robert C. Wilson,et al. Orbitofrontal Cortex as a Cognitive Map of Task Space , 2014, Neuron.
[12] A. Markman,et al. Journal of Experimental Psychology : General Retrospective Revaluation in Sequential Decision Making : A Tale of Two Systems , 2012 .
[13] Martin Egelhaaf,et al. Prototypical Components of Honeybee Homing Flight Behavior Depend on the Visual Appearance of Objects Surrounding the Goal , 2012, Front. Behav. Neurosci..
[14] Hugo J. Spiers,et al. Solving the detour problem in navigation: a model of prefrontal and hippocampal interactions , 2015, Front. Hum. Neurosci..
[15] B. Balleine,et al. The role of the dorsomedial striatum in instrumental conditioning , 2005, The European journal of neuroscience.
[16] Peter Dayan,et al. Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task , 2015, bioRxiv.
[17] C. Thinus-Blanc,et al. Route planning in cats, in relation to the visibility of the goal , 1983, Animal Behaviour.
[18] T. Sejnowski,et al. Neurocomputational models of working memory , 2000, Nature Neuroscience.
[19] Tao Wang,et al. Dual Representations for Dynamic Programming and Reinforcement Learning , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[20] Nathaniel D. Daw,et al. Value Learning through Reinforcement , 2014 .
[21] Joel L. Davis,et al. A Model of How the Basal Ganglia Generate and Use Neural Signals That Predict Reinforcement , 1994 .
[22] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[23] Giovanni Pezzulo,et al. The Mixed Instrumental Controller: Using Value of Information to Combine Habitual Choice and Mental Simulation , 2013, Front. Psychol..
[24] B. Balleine,et al. Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action , 2010, Neuropsychopharmacology.
[25] P. Dayan,et al. Adaptive integration of habits into depth-limited planning defines a habitual-goal–directed spectrum , 2016, Proceedings of the National Academy of Sciences.
[26] Nicolas W. Schuck,et al. Human Orbitofrontal Cortex Represents a Cognitive Map of State Space , 2016, Neuron.
[27] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[28] K. Doya,et al. Representation of Action-Specific Reward Values in the Striatum , 2005, Science.
[29] John-Dylan Haynes,et al. Human anterior prefrontal cortex encodes the ‘what’ and ‘when’ of future intentions , 2012, NeuroImage.
[30] M. Botvinick,et al. The hippocampus as a predictive map , 2016 .
[31] M. Wilson,et al. Disruption of ripple‐associated hippocampal activity during rest impairs spatial learning in the rat , 2009, Hippocampus.
[32] Andrew M. Wikenheiser,et al. Over the river, through the woods: cognitive maps in the hippocampus and orbitofrontal cortex , 2016, Nature Reviews Neuroscience.
[33] N. Daw,et al. Deciding How To Decide: Self-Control and Meta-Decision Making , 2015, Trends in Cognitive Sciences.
[34] Per B. Sederberg,et al. The Successor Representation and Temporal Context , 2012, Neural Computation.
[35] Ilana B. Witten,et al. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target , 2016, Nature Neuroscience.
[36] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[37] E. Tolman. Cognitive maps in rats and men. , 1948, Psychological review.
[38] B. Balleine,et al. Sensitivity to Instrumental Contingency Degradation Is Mediated by the Entorhinal Cortex and Its Efferents via the Dorsal Hippocampus , 2002, The Journal of Neuroscience.
[39] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.
[40] Timothy Edward John Behrens,et al. Two Anatomically and Computationally Distinct Learning Signals Predict Changes to Stimulus-Outcome Associations in Hippocampus , 2016, Neuron.
[41] Alice Alvernhe,et al. Different CA1 and CA3 Representations of Novel Routes in a Shortcut Situation , 2008, The Journal of Neuroscience.
[42] M. Corballis. Wandering tales: evolutionary origins of mental time travel and language , 2013, Front. Psychol..
[43] B. McNaughton,et al. Reactivation of Hippocampal Cell Assemblies: Effects of Behavioral State, Experience, and EEG Dynamics , 1999, The Journal of Neuroscience.
[44] P. Dayan,et al. Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.
[45] E. Miller,et al. An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.
[46] G. Buzsáki. Two-stage model of memory trace formation: A role for “noisy” brain states , 1989, Neuroscience.
[47] R. Dolan,et al. Dopamine Enhances Model-Based over Model-Free Choice Behavior , 2012, Neuron.
[48] Dylan A. Simon,et al. Neural Correlates of Forward Planning in a Spatial Decision Task in Humans , 2011, The Journal of Neuroscience.
[49] P. Dayan,et al. The algorithmic anatomy of model-based evaluation , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.
[50] N. Daw,et al. Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning , 2016, The Journal of Neuroscience.
[51] Ari Weinstein,et al. Model-based hierarchical reinforcement learning and human action control , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.
[52] P. Dayan,et al. Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.
[53] Nathaniel D. Daw,et al. Grid Cells, Place Cells, and Geodesic Generalization for Spatial Reinforcement Learning , 2011, PLoS Comput. Biol..
[54] Peter Dayan,et al. Improving Generalization for Temporal Difference Learning: The Successor Representation , 1993, Neural Computation.
[55] B. McNaughton,et al. Reactivation of hippocampal ensemble memories during sleep. , 1994, Science.
[56] R. Suri. Anticipatory responses of dopamine neurons and cortical neurons reproduced by internal model , 2001, Experimental Brain Research.
[57] P. Dayan,et al. States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.
[58] Lauren V. Kustner,et al. Shaping of Object Representations in the Human Medial Temporal Lobe Based on Temporal Regularities , 2012, Current Biology.
[59] M. Botvinick,et al. The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.
[60] Richard S. Sutton,et al. TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.
[61] B. Balleine,et al. Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning , 2004, The European journal of neuroscience.
[62] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.
[63] Geoffrey Schoenbaum,et al. Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework , 2016, eLife.
[64] H. Nakahara. Multiplexing signals in reinforcement learning with internal models and dopamine , 2014, Current Opinion in Neurobiology.
[65] Philippe Gaussier,et al. From view cells and place cells to cognitive map learning: processing stages of the hippocampal system , 2002, Biological Cybernetics.
[66] B. Balleine,et al. Habits, action sequences and reinforcement learning , 2012, The European journal of neuroscience.
[67] Peter Dayan,et al. Interplay of approximate planning strategies , 2015, Proceedings of the National Academy of Sciences.
[68] Makoto Ito,et al. Model-based action planning involves cortico-cerebellar and basal ganglia networks , 2016, Scientific Reports.
[69] Morris Moscovitch,et al. An investigation of the effects of hippocampal lesions in rats on pre‐ and postoperatively acquired spatial memory in a complex environment , 2010, Hippocampus.
[70] Kate Jeffery,et al. Horizontal biases in rats’ use of three-dimensional space , 2011, Behavioural Brain Research.
[71] P. Glimcher. Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis , 2011, Proceedings of the National Academy of Sciences.
[72] Dylan A. Simon,et al. Model-based choices involve prospective neural activity , 2015, Nature Neuroscience.
[73] P. Dayan,et al. Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.
[74] Soo-Young Lee,et al. An Optimization Network for Matrix Inversion , 1987, NIPS.
[75] Matthijs A. A. van der Meer,et al. Expectancies in Decision Making, Reinforcement Learning, and Ventral Striatum , 2009, Frontiers in neuroscience.
[76] Timothy E. J. Behrens,et al. Online evaluation of novel choices by simultaneous representation of multiple memories , 2013, Nature Neuroscience.
[77] B. Balleine,et al. Motivational control of goal-directed action , 1994 .
[78] M. Botvinick,et al. Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.
[79] M. Botvinick,et al. Neural representations of events arise from temporal community structure , 2013, Nature Neuroscience.
[80] David S. Lorberbaum,et al. Genetic evidence that Nkx2.2 acts primarily downstream of Neurog3 in pancreatic endocrine lineage development , 2017, eLife.
[81] C. H. Honzik,et al. Degrees of hunger, reward and non-reward, and maze learning in rats, and Introduction and removal of reward, and maze performance in rats , 1930 .
[82] Anders Lansner,et al. Computing the Local Field Potential (LFP) from Integrate-and-Fire Network Models , 2015, PLoS Comput. Biol..
[83] B. Balleine,et al. Multiple Forms of Value Learning and the Function of Dopamine , 2009 .
[84] Kenji Doya,et al. What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? , 1999, Neural Networks.
[85] Takeo Watanabe,et al. Temporally Extended Dopamine Responses to Perceptually Demanding Reward-Predictive Stimuli , 2010, The Journal of Neuroscience.
[86] N. Daw,et al. Dopamine selectively remediates 'model-based' reward learning: a computational approach. , 2016, Brain : a journal of neurology.
[87] Soo Hong Chew,et al. Dissociable contribution of prefrontal and striatal dopaminergic genes to learning in economic games , 2014, Proceedings of the National Academy of Sciences.
[88] Kenji Doya,et al. Hierarchical control of goal-directed action in the cortical–basal ganglia network , 2015, Current Opinion in Behavioral Sciences.
[89] Petra Himmel,et al. Stevens Handbook Of Experimental Psychology Learning Motivation And Emotion , 2016 .
[90] G. Buzsáki,et al. Selective suppression of hippocampal ripples impairs spatial memory , 2009, Nature Neuroscience.
[91] Shinsuke Shimojo,et al. Neural Computations Underlying Arbitration between Model-Based and Model-free Learning , 2013, Neuron.
[92] Alec Solway,et al. Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates. , 2012, Psychological review.
[93] Michael J. Frank,et al. By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.
[94] D. Shohamy,et al. Preference by Association: How Memory Mechanisms in the Hippocampus Bias Decisions , 2012, Science.
[95] Tom Schaul,et al. Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.
[96] B. Balleine,et al. The Role of Learning in the Operation of Motivational Systems , 2002 .
[97] Saori C. Tanaka,et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.
[98] J. Danckert,et al. Deficits in reflexive covert attention following cerebellar injury , 2015, Front. Hum. Neurosci..
[99] P. Dayan,et al. A mathematical model explains saturating axon guidance responses to molecular gradients , 2016, eLife.
[100] R. Dolan,et al. Ventral striatal dopamine reflects behavioral and neural signatures of model-based control during sequential decision making , 2015, Proceedings of the National Academy of Sciences.
[101] S. Haber. The primate basal ganglia: parallel and integrative networks , 2003, Journal of Chemical Neuroanatomy.
[102] A. Dickinson. Actions and habits: the development of behavioural autonomy , 1985 .
[103] M. Botvinick,et al. Statistical learning of temporal community structure in the hippocampus , 2016, Hippocampus.
[104] C. Thinus-Blanc,et al. The role of exploratory experience in a shortcut task by golden hamsters (Mesocricetus auratus) , 1987 .
[105] P. Dayan,et al. A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[106] L. Frank,et al. Awake Hippocampal Sharp-Wave Ripples Support Spatial Memory , 2012, Science.
[107] Richard S. Sutton,et al. Associative Learning from Replayed Experience , 2017, bioRxiv.
[108] I. Momennejad,et al. Encoding of Prospective Tasks in the Human Prefrontal Cortex under Varying Task Loads , 2013, The Journal of Neuroscience.
[109] Matthijs A. A. van der Meer,et al. Hippocampal Replay Is Not a Simple Function of Experience , 2010, Neuron.