Scalable reinforcement learning through hierarchical decompositions for weakly-coupled problems
暂无分享,去创建一个
Jochen Triesch | Constantin A. Rothkopf | Hazem Toutounji | C. Rothkopf | J. Triesch | H. Toutounji | Hazem Toutounji
[1] W. Schultz. Multiple reward signals in the brain , 2000, Nature Reviews Neuroscience.
[2] M. Botvinick,et al. Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective , 2009, Cognition.
[3] F. Mora,et al. Brain self-stimulation: direct evidence for the involvement of dopamine in the prefrontal cortex. , 1977, Science.
[4] P. Huttenlocher. Synapse elimination and plasticity in developing human cerebral cortex. , 1984, American journal of mental deficiency.
[5] E. Vaadia,et al. Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.
[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[7] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.
[8] Stuart J. Russell,et al. Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.
[9] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[10] Constantin A Rothkopf,et al. Image statistics at the point of gaze during human navigation , 2009, Visual Neuroscience.
[11] H. Loos,et al. Synaptogenesis in human visual cortex — evidence for synapse elimination during normal development , 1982, Neuroscience Letters.
[12] Mitsuo Kawato,et al. Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.
[13] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] J. O'Doherty,et al. Encoding Predictive Reward Value in Human Amygdala and Orbitofrontal Cortex , 2003, Science.
[16] M. Hayhoe,et al. What controls attention in natural environments? , 2001, Vision Research.
[17] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..
[18] R. Dolan,et al. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.
[19] Jürgen Schmidhuber,et al. Planning simple trajectories using neural subgoal generators , 1993 .
[20] Kee-Eung Kim,et al. Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.
[21] P. Huttenlocher. Morphometric study of human cerebral cortex development , 1990, Neuropsychologia.
[22] Mark Humphreys,et al. Action selection methods using reinforcement learning , 1997 .
[23] M. Kawato,et al. Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning. , 2006, Journal of neurophysiology.
[24] P. Dayan,et al. Opinion TRENDS in Cognitive Sciences Vol.10 No.8 Full text provided by www.sciencedirect.com A normative perspective on motivation , 2022 .
[25] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[26] P. Huttenlocher,et al. The development of synapses in striate cortex of man. , 1987, Human neurobiology.
[27] P. Huttenlocher,et al. Regional differences in synaptogenesis in human cerebral cortex , 1997, The Journal of comparative neurology.
[28] Dana H. Ballard,et al. Credit Assignment in Multiple Goal Embodied Visuomotor Behavior , 2010, Front. Psychology.
[29] Dana H. Ballard,et al. Modular models of task based visually guided behavior , 2009 .
[30] Timothy C Rickard,et al. Taxing executive processes does not necessarily increase impulsive decision making. , 2010, Experimental psychology.
[31] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.