论文信息 - Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm - 字舞流文

Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm

M. Kawato | K. Samejima | Keiji Tanaka | Takeo Watanabe

[1] Mitsuo Kawato,et al. Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[2] M. Kawato,et al. Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning. , 2006, Journal of neurophysiology.

[3] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[4] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[5] Peter Dayan,et al. A Neural Substrate of Prediction and Reward , 1997, Science.

[6] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[7] Saori C. Tanaka,et al. Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[8] Jun Morimoto,et al. Hierarchical reinforcement learning for motion learning: learning 'stand-up' trajectories , 1998, Adv. Robotics.

[9] Junichiro Yoshimoto,et al. Control of exploitation-exploration meta-parameter in reinforcement learning , 2002, Neural Networks.

[10] Timothy E. J. Behrens,et al. Optimal decision making and the anterior cingulate cortex , 2006, Nature Neuroscience.

[11] A. Dickinson,et al. Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[12] C. Padoa-Schioppa,et al. Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.

[13] J. Wickens,et al. Neural mechanisms of reward-related motor learning , 2003, Current Opinion in Neurobiology.

[14] P. Glimcher,et al. Activity in Posterior Parietal Cortex Is Correlated with the Relative Subjective Desirability of Action , 2004, Neuron.

[15] Mitsuo Kawato,et al. Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[16] W. Newsome,et al. Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[17] K. Doya,et al. Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[18] Yutaka Sakai,et al. Computational algorithms and neuronal network models underlying decision processes , 2006, Neural Networks.

[19] E. Miller,et al. Different time courses of learning-related activity in the prefrontal cortex and striatum , 2005, Nature.

[20] Colin Camerer,et al. Neural Systems Responding to Degrees of Uncertainty in Human Decision-Making , 2005, Science.

[21] Ziv M. Williams,et al. Selective enhancement of associative learning by microstimulation of the anterior caudate , 2006, Nature Neuroscience.

[22] Karl J. Friston,et al. Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[23] Yasushi Kobayashi,et al. Reward predicting activity of pedunculopontine tegmental nucleus neurons during visually guided saccade tasks , 2005 .

[24] Yasushi Kobayashi,et al. Contribution of pedunculopontine tegmental nucleus neurons to performance of visually guided saccade tasks in monkeys. , 2002, Journal of neurophysiology.

[25] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[26] O. Hikosaka,et al. A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. , 2004, Journal of neurophysiology.

[27] W. Schultz,et al. Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[28] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..

[29] Jonathan D. Cohen,et al. Imaging valuation models in human choice. , 2006, Annual review of neuroscience.

[30] Kenji Doya,et al. Meta-learning in Reinforcement Learning , 2003, Neural Networks.

[31] P. Dayan,et al. Cortical substrates for exploratory decisions in humans , 2006, Nature.

[32] J. O'Doherty,et al. Human Neural Learning Depends on Reward Prediction Errors in the Blocking Paradigm , 2005, Journal of Neurophysiology.

[33] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[34] M. Walton,et al. Separate neural pathways process different decision costs , 2006, Nature Neuroscience.

[35] Kenji Doya,et al. Metalearning and neuromodulation , 2002, Neural Networks.

[36] S. Quartz,et al. Neural Differentiation of Expected Reward and Risk in Human Subcortical Structures , 2006, Neuron.

[37] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.

[38] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[39] Mitsuo Kawato,et al. MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[40] Xiao-Jing Wang,et al. Neural mechanism for stochastic behaviour during a competitive game , 2006, Neural Networks.

[41] R. Dolan,et al. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[42] W. Schultz,et al. Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[43] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[44] Kae Nakamura,et al. Role of Dopamine in the Primate Caudate Nucleus in Reward Modulation of Saccades , 2006, The Journal of Neuroscience.

[45] D. Barraclough,et al. Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[46] Mitsuo Kawato,et al. Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning , 2006, Neural Networks.

[47] Satinder Singh. Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[48] Mitsuo Kawato,et al. Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.

[49] O. Hikosaka,et al. Reward-predicting activity of dopamine and caudate neurons--a possible mechanism of motivational control of saccadic eye movement. , 2004, Journal of neurophysiology.

[50] M. Roesch,et al. Orbitofrontal cortex, decision-making and drug addiction , 2006, Trends in Neurosciences.

[51] J. O'Doherty,et al. The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans , 2006, The Journal of Neuroscience.

[52] Keiji Tanaka,et al. Neuronal Correlates of Goal-Based Motor Selection in the Prefrontal Cortex , 2003, Science.

[53] W. Newsome,et al. Choosing the greater of two goods: neural currencies for valuation and decision making , 2005, Nature Reviews Neuroscience.

[54] K. Doya,et al. A Neural Correlate of Reward-Based Behavioral Learning in Caudate Nucleus: A Functional Magnetic Resonance Imaging Study of a Stochastic Decision Task , 2004, The Journal of Neuroscience.

[55] Kenji Doya,et al. Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics , 2006, Neural Networks.

[56] E. Vaadia,et al. Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[57] M. Roesch,et al. Encoding of Time-Discounted Rewards in Orbitofrontal Cortex Is Independent of Value Representation , 2006, Neuron.

[58] S. Haber. The primate basal ganglia: parallel and integrative networks , 2003, Journal of Chemical Neuroanatomy.

[59] H. Seung,et al. JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 581–617 NUMBER 3(NOVEMBER) LINEAR-NONLINEAR-POISSON MODELS OF PRIMATE CHOICE DYNAMICS , 2022 .

[60] K. Doya. Complementary roles of basal ganglia and cerebellum in learning and motor control , 2000, Current Opinion in Neurobiology.

[61] S. Ishii,et al. Resolution of Uncertainty in Prefrontal Cortex , 2006, Neuron.

[62] Wolfram Schultz,et al. Relative reward processing in primate striatum , 2005, Experimental Brain Research.

[63] Joel L. Davis,et al. Adaptive Critics and the Basal Ganglia , 1995 .

[64] Tatsuo K Sato,et al. Correlated Coding of Motivation and Outcome of Decision by Dopamine Neurons , 2003, The Journal of Neuroscience.

[65] Peter Dayan,et al. Temporal difference models describe higher-order learning in humans , 2004, Nature.

[66] P. Glimcher,et al. JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[67] D M Wolpert,et al. Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[68] A. Redish,et al. Addiction as a Computational Process Gone Awry , 2004, Science.

[69] Kiyohiko Nakamura. Neural representation of information measure in the primate premotor cortex. , 2006, Journal of neurophysiology.

[70] D. Barraclough,et al. Reinforcement learning and decision making in monkeys during a competitive game. , 2004, Brain research. Cognitive brain research.

[71] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .

[72] E. Vaadia,et al. Coincident but Distinct Messages of Midbrain Dopamine and Striatal Tonically Active Neurons , 2004, Neuron.

[73] J. Paul Bolam,et al. Pedunculopontine nucleus and basal ganglia: distant relatives or part of the same family? , 2004, Trends in Neurosciences.