Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm

[1]  Mitsuo Kawato,et al.  Internal models for motor control and trajectory planning , 1999, Current Opinion in Neurobiology.

[2]  M. Kawato,et al.  Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning. , 2006, Journal of neurophysiology.

[3]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[4]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[5]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[6]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[7]  Saori C. Tanaka,et al.  Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[8]  Jun Morimoto,et al.  Hierarchical reinforcement learning for motion learning: learning 'stand-up' trajectories , 1998, Adv. Robotics.

[9]  Junichiro Yoshimoto,et al.  Control of exploitation-exploration meta-parameter in reinforcement learning , 2002, Neural Networks.

[10]  Timothy E. J. Behrens,et al.  Optimal decision making and the anterior cingulate cortex , 2006, Nature Neuroscience.

[11]  A. Dickinson,et al.  Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[12]  C. Padoa-Schioppa,et al.  Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.

[13]  J. Wickens,et al.  Neural mechanisms of reward-related motor learning , 2003, Current Opinion in Neurobiology.

[14]  P. Glimcher,et al.  Activity in Posterior Parietal Cortex Is Correlated with the Relative Subjective Desirability of Action , 2004, Neuron.

[15]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[16]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[17]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[18]  Yutaka Sakai,et al.  Computational algorithms and neuronal network models underlying decision processes , 2006, Neural Networks.

[19]  E. Miller,et al.  Different time courses of learning-related activity in the prefrontal cortex and striatum , 2005, Nature.

[20]  Colin Camerer,et al.  Neural Systems Responding to Degrees of Uncertainty in Human Decision-Making , 2005, Science.

[21]  Ziv M. Williams,et al.  Selective enhancement of associative learning by microstimulation of the anterior caudate , 2006, Nature Neuroscience.

[22]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[23]  Yasushi Kobayashi,et al.  Reward predicting activity of pedunculopontine tegmental nucleus neurons during visually guided saccade tasks , 2005 .

[24]  Yasushi Kobayashi,et al.  Contribution of pedunculopontine tegmental nucleus neurons to performance of visually guided saccade tasks in monkeys. , 2002, Journal of neurophysiology.

[25]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[26]  O. Hikosaka,et al.  A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. , 2004, Journal of neurophysiology.

[27]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[28]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[29]  Jonathan D. Cohen,et al.  Imaging valuation models in human choice. , 2006, Annual review of neuroscience.

[30]  Kenji Doya,et al.  Meta-learning in Reinforcement Learning , 2003, Neural Networks.

[31]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[32]  J. O'Doherty,et al.  Human Neural Learning Depends on Reward Prediction Errors in the Blocking Paradigm , 2005, Journal of Neurophysiology.

[33]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[34]  M. Walton,et al.  Separate neural pathways process different decision costs , 2006, Nature Neuroscience.

[35]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[36]  S. Quartz,et al.  Neural Differentiation of Expected Reward and Risk in Human Subcortical Structures , 2006, Neuron.

[37]  Richard S. Sutton,et al.  Planning by Incremental Dynamic Programming , 1991, ML.

[38]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[39]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[40]  Xiao-Jing Wang,et al.  Neural mechanism for stochastic behaviour during a competitive game , 2006, Neural Networks.

[41]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[42]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[43]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[44]  Kae Nakamura,et al.  Role of Dopamine in the Primate Caudate Nucleus in Reward Modulation of Saccades , 2006, The Journal of Neuroscience.

[45]  D. Barraclough,et al.  Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[46]  Mitsuo Kawato,et al.  Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning , 2006, Neural Networks.

[47]  Satinder Singh Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[48]  Mitsuo Kawato,et al.  Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.

[49]  O. Hikosaka,et al.  Reward-predicting activity of dopamine and caudate neurons--a possible mechanism of motivational control of saccadic eye movement. , 2004, Journal of neurophysiology.

[50]  M. Roesch,et al.  Orbitofrontal cortex, decision-making and drug addiction , 2006, Trends in Neurosciences.

[51]  J. O'Doherty,et al.  The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans , 2006, The Journal of Neuroscience.

[52]  Keiji Tanaka,et al.  Neuronal Correlates of Goal-Based Motor Selection in the Prefrontal Cortex , 2003, Science.

[53]  W. Newsome,et al.  Choosing the greater of two goods: neural currencies for valuation and decision making , 2005, Nature Reviews Neuroscience.

[54]  K. Doya,et al.  A Neural Correlate of Reward-Based Behavioral Learning in Caudate Nucleus: A Functional Magnetic Resonance Imaging Study of a Stochastic Decision Task , 2004, The Journal of Neuroscience.

[55]  Kenji Doya,et al.  Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics , 2006, Neural Networks.

[56]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[57]  M. Roesch,et al.  Encoding of Time-Discounted Rewards in Orbitofrontal Cortex Is Independent of Value Representation , 2006, Neuron.

[58]  S. Haber The primate basal ganglia: parallel and integrative networks , 2003, Journal of Chemical Neuroanatomy.

[59]  H. Seung,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 581–617 NUMBER 3(NOVEMBER) LINEAR-NONLINEAR-POISSON MODELS OF PRIMATE CHOICE DYNAMICS , 2022 .

[60]  K. Doya Complementary roles of basal ganglia and cerebellum in learning and motor control , 2000, Current Opinion in Neurobiology.

[61]  S. Ishii,et al.  Resolution of Uncertainty in Prefrontal Cortex , 2006, Neuron.

[62]  Wolfram Schultz,et al.  Relative reward processing in primate striatum , 2005, Experimental Brain Research.

[63]  Joel L. Davis,et al.  Adaptive Critics and the Basal Ganglia , 1995 .

[64]  Tatsuo K Sato,et al.  Correlated Coding of Motivation and Outcome of Decision by Dopamine Neurons , 2003, The Journal of Neuroscience.

[65]  Peter Dayan,et al.  Temporal difference models describe higher-order learning in humans , 2004, Nature.

[66]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[67]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[68]  A. Redish,et al.  Addiction as a Computational Process Gone Awry , 2004, Science.

[69]  Kiyohiko Nakamura Neural representation of information measure in the primate premotor cortex. , 2006, Journal of neurophysiology.

[70]  D. Barraclough,et al.  Reinforcement learning and decision making in monkeys during a competitive game. , 2004, Brain research. Cognitive brain research.

[71]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[72]  E. Vaadia,et al.  Coincident but Distinct Messages of Midbrain Dopamine and Striatal Tonically Active Neurons , 2004, Neuron.

[73]  J. Paul Bolam,et al.  Pedunculopontine nucleus and basal ganglia: distant relatives or part of the same family? , 2004, Trends in Neurosciences.