Evidence for Hyperbolic Temporal Discounting of Reward in Control of Movements

Suppose that the purpose of a movement is to place the body in a more rewarding state. In this framework, slower movements may increase accuracy and therefore improve the probability of acquiring reward, but the longer durations of slow movements produce devaluation of reward. Here we hypothesize that the brain decides the vigor of a movement (duration and velocity) based on the expected discounted reward associated with that movement. We begin by showing that durations of saccades of varying amplitude can be accurately predicted by a model in which motor commands maximize expected discounted reward. This result suggests that reward is temporally discounted even in timescales of tens of milliseconds. One interpretation of temporal discounting is that the true objective of the brain is to maximize the rate of reward—which is equivalent to a specific form of hyperbolic discounting. A consequence of this idea is that the vigor of saccades should change as one alters the intertrial intervals between movements. We find experimentally that in healthy humans, as intertrial intervals are varied, saccade peak velocities and durations change on a trial-by-trial basis precisely as predicted by a model in which the objective is to maximize the rate of reward. Our results are inconsistent with theories in which reward is discounted exponentially. We suggest that there exists a single cost, rate of reward, which provides a unifying principle that may govern control of movements in timescales of milliseconds, as well as decision making in timescales of seconds to years.

[1]  W. Schultz,et al.  Responses of monkey dopamine neurons during learning of behavioral reactions. , 1992, Journal of neurophysiology.

[2]  R. Shadmehr,et al.  Temporal Discounting of Reward and the Cost of Time in Motor Control , 2010, The Journal of Neuroscience.

[3]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[4]  Michael L. Platt,et al.  Neural correlates of reward and attention in macaque area LIP , 2006, Neuropsychologia.

[5]  D. A. Robinson,et al.  The systems approach to the oculomotor system , 1986, Vision Research.

[6]  H. Zelaznik,et al.  Motor-output variability: a theory for the accuracy of rapid motor acts. , 1979, Psychological review.

[7]  Benjamin Y. Hayden,et al.  Temporal Discounting Predicts Risk Sensitivity in Rhesus Macaques , 2007, Current Biology.

[8]  L. Green,et al.  Discounting of delayed rewards: Models of individual choice. , 1995, Journal of the experimental analysis of behavior.

[9]  Leslie G. Ungerleider,et al.  The role of striate cortex in the guidance of eye movements in the monkey , 1987, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[10]  P. Fitts The information capacity of the human motor system in controlling the amplitude of movement. , 1954, Journal of experimental psychology.

[11]  P. Thier,et al.  The Absence of Eye Muscle Fatigue Indicates That the Nervous System Compensates for Non-Motor Disturbances of Oculomotor Function , 2010, The Journal of Neuroscience.

[12]  Ka-Chun Siu,et al.  Saccadic Output Is Influenced by Limb Kinetics During Eye—Hand Coordination , 2004, Journal of motor behavior.

[13]  R. V. van Beers Saccadic Eye Movements Minimize the Consequences of Motor Noise , 2008, PloS one.

[14]  J. Krakauer,et al.  Why Don't We Move Faster? Parkinson's Disease, Movement Vigor, and Implicit Motivation , 2007, The Journal of Neuroscience.

[15]  Gopal Santhanam,et al.  Preparatory activity in premotor and motor cortex reflects the speed of the upcoming reach. , 2006, Journal of neurophysiology.

[16]  Wilsaan M. Joiner,et al.  Adaptive Control of Saccades via Internal Feedback , 2008, The Journal of Neuroscience.

[17]  N. Daw,et al.  Reinforcement learning models of the dopamine system and their behavioral implications , 2003 .

[18]  R. J. Beers Correction: Saccadic Eye Movements Minimize the Consequences of Motor Noise , 2008 .

[19]  P. Cisek,et al.  Decisions in Changing Conditions: The Urgency-Gating Model , 2009, The Journal of Neuroscience.

[20]  H. Collewijn,et al.  Binocular co‐ordination of human horizontal saccadic eye movements. , 1988, The Journal of physiology.

[21]  R. Nelson,et al.  Motor Planning under Unpredictable Reward: Modulations of Movement Vigor and Primate Striatum Activity , 2011, Front. Neurosci..

[22]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[23]  R. Shadmehr,et al.  The intrinsic value of visual information affects saccade velocities , 2009, Experimental Brain Research.

[24]  David M Milstein,et al.  The Influence of Expected Value on Saccadic Preparation , 2007, The Journal of Neuroscience.

[25]  W. Schultz,et al.  Influence of Reward Delays on Responses of Dopamine Neurons , 2008, The Journal of Neuroscience.

[26]  O. Hikosaka,et al.  Modulation of saccadic eye movements by predicted reward outcome , 2001, Experimental Brain Research.

[27]  M. Frank,et al.  From reinforcement learning models to psychiatric and neurological disorders , 2011, Nature Neuroscience.

[28]  R. J. van Beers,et al.  The Sources of Variability in Saccadic Eye Movements , 2007, The Journal of Neuroscience.

[29]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[30]  A. Kacelnik Normative and descriptive models of decision making: time discounting and risk sensitivity. , 2007, Ciba Foundation symposium.

[31]  Jonathan D. Cohen,et al.  Reward rate optimization in two-alternative decision making: empirical tests of theoretical predictions. , 2009, Journal of experimental psychology. Human perception and performance.

[32]  O. Hikosaka,et al.  Immediate changes in anticipatory activity of caudate neurons associated with reversal of position-reward contingency. , 2005, Journal of neurophysiology.

[33]  Daniel M. Wolpert,et al.  The Main Sequence of Saccades Optimizes Speed-accuracy Trade-off , 2006, Biological Cybernetics.

[34]  Kenji Doya,et al.  Humans Can Adopt Optimal Discounting Strategy under Real-Time Constraints , 2006, PLoS Comput. Biol..

[35]  J. Gold,et al.  Banburismus and the Brain Decoding the Relationship between Sensory Stimuli, Decisions, and Reward , 2002, Neuron.

[36]  Daniel M. Wolpert,et al.  Making smooth moves , 2022 .

[37]  Anthony R. Dickinson,et al.  Eye-hand coordination: saccades are faster when accompanied by a coordinated arm movement. , 2002, Journal of neurophysiology.