Modeling sensory-motor decisions in natural behavior

Although a standard reinforcement learning model can capture many aspects of reward-seeking behaviors, it may not be practical for modeling human natural behaviors because of the richness of dynamic environments and limitations in cognitive resources. We propose a modular reinforcement learning model that addresses these factors. Based on this model, a modular inverse reinforcement learning algorithm is developed to estimate both the rewards and discount factors from human behavioral data, which allows predictions of human navigation behaviors in virtual reality with high accuracy across different subjects and with different tasks. Complex human navigation trajectories in novel environments can be reproduced by an artificial agent that is based on the modular model. This model provides a strategy for estimating the subjective value of actions and how they influence sensory-motor decisions in natural behavior.

[1]  Mary Hayhoe,et al.  Predicting human visuomotor behaviour in a driving task , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[2]  Dana H. Ballard,et al.  Modular inverse reinforcement learning for visuomotor behavior , 2013, Biological Cybernetics.

[3]  Eduardo Martin Moraud,et al.  Properties of Neurons in External Globus Pallidus Can Support Optimal Action Selection , 2016, PLoS Comput. Biol..

[4]  David J. Foster,et al.  A model of hippocampally dependent navigation, using the temporal difference learning rule , 2000, Hippocampus.

[5]  Dmitry Kit,et al.  A hierarchical modular architecture for embodied cognition. , 2013, Multisensory research.

[6]  H. Seo,et al.  Neural basis of reinforcement learning and decision making. , 2012, Annual review of neuroscience.

[7]  P. Brown,et al.  The human subthalamic nucleus encodes the subjective value of reward and the cost of effort during decision-making. , 2016, Brain : a journal of neurology.

[8]  M. Botvinick,et al.  The successor representation in human reinforcement learning , 2016, Nature Human Behaviour.

[9]  Dana H. Ballard,et al.  Modeling embodied visual behaviors , 2007, TAP.

[10]  Mary M Hayhoe,et al.  Task and context determine where you look. , 2016, Journal of vision.

[11]  Dana H. Ballard,et al.  Global Policy Construction in Modular Reinforcement Learning , 2015, AAAI.

[12]  Dino J. Levy,et al.  The root of all value: a neural common currency for choice , 2012, Current Opinion in Neurobiology.

[13]  Ronald C. Arkin,et al.  Motor Schema — Based Mobile Robot Navigation , 1989, Int. J. Robotics Res..

[14]  Mary M. Hayhoe,et al.  Gaze and the Control of Foot Placement When Walking in Natural Terrain , 2018, Current Biology.

[15]  Romain Laroche,et al.  Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.

[16]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[17]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[18]  Kee-Eung Kim,et al.  Hierarchical Bayesian Inverse Reinforcement Learning , 2015, IEEE Transactions on Cybernetics.

[19]  Leland S Stone,et al.  Spatial scale of stereomotion speed processing. , 2006, Journal of vision.

[20]  Yiannis Aloimonos,et al.  Vision and action , 1995, Image Vis. Comput..

[21]  D. Ballard,et al.  Modeling Task Control of Eye Movements , 2014, Current Biology.

[22]  K. Doya,et al.  A Neural Correlate of Reward-Based Behavioral Learning in Caudate Nucleus: A Functional Magnetic Resonance Imaging Study of a Stochastic Decision Task , 2004, The Journal of Neuroscience.

[23]  Michael S Landy,et al.  Motor control is decision-making , 2012, Current Opinion in Neurobiology.

[24]  Tom Schaul,et al.  Q-Error as a Selection Mechanism in Modular Reinforcement-Learning Systems , 2011, IJCAI.

[25]  Giles W. Story,et al.  Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective , 2014, Front. Behav. Neurosci..

[26]  Shun Zhang,et al.  Multitask Human Navigation in VR with Motion Tracking , 2017 .

[27]  Rudolf N. Cardinal,et al.  Neural systems implicated in delayed and probabilistic reinforcement , 2006, Neural Networks.

[28]  Constantin A Rothkopf,et al.  Image statistics at the point of gaze during human navigation , 2009, Visual Neuroscience.

[29]  Dana H. Ballard,et al.  Multiple-Goal Reinforcement Learning with Modular Sarsa(0) , 2003, IJCAI.

[30]  Dana H. Ballard,et al.  Brain Computation as Hierarchical Abstraction , 2015 .

[31]  M. Kawato,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[32]  K. Doya Modulators of decision making , 2008, Nature Neuroscience.

[33]  Jacqueline Gottlieb,et al.  Attention, Reward, and Information Seeking , 2014, The Journal of Neuroscience.

[34]  Mary M. Hayhoe,et al.  Control of gaze while walking: Task structure, reward, and uncertainty , 2017, Journal of vision.

[35]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[36]  Saori C. Tanaka,et al.  Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops , 2004, Nature Neuroscience.

[37]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[38]  Sridhar Mahadevan,et al.  Coarticulation: an approach for generating concurrent plans in Markov decision processes , 2005, ICML.

[39]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[40]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[41]  Christopher P. Puto,et al.  Adding Asymmetrically Dominated Alternatives: Violations of Regularity & the Similarity Hypothesis. , 1981 .

[42]  Mary Hayhoe,et al.  Control of attention and gaze in complex environments. , 2006, Journal of vision.

[43]  Michael L. Littman,et al.  Apprenticeship Learning About Multiple Intentions , 2011, ICML.

[44]  Brett R. Fajen,et al.  Visual navigation and obstacle avoidance using a steering potential function , 2006, Robotics Auton. Syst..

[45]  Stuart J. Russell,et al.  Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.

[46]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[47]  Chris L. Baker,et al.  Goal Inference as Inverse Planning , 2007 .

[48]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[49]  Michael Mateas,et al.  On the Difficulty of Modular Reinforcement Learning for Real-World Partial Programming , 2006, AAAI.

[50]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[51]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[52]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[53]  Saori C. Tanaka,et al.  Low-Serotonin Levels Increase Delayed Reward Discounting in Humans , 2008, The Journal of Neuroscience.

[54]  Mary Hayhoe,et al.  Objects in the peripheral visual field influence gaze location in natural vision. , 2015, Journal of vision.

[55]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[56]  M. Hayhoe,et al.  Adaptive Gaze Control in Natural Environments , 2009, The Journal of Neuroscience.

[57]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[58]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[59]  Alec Solway,et al.  Optimal Behavioral Hierarchy , 2014, PLoS Comput. Biol..

[60]  Mitsuo Kawato,et al.  Inter-module credit assignment in modular reinforcement learning , 2003, Neural Networks.

[61]  N. Daw,et al.  Human Reinforcement Learning Subdivides Structured Action Spaces by Learning Effector-Specific Values , 2009, The Journal of Neuroscience.

[62]  Clay B. Holroyd,et al.  The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. , 2002, Psychological review.