Goal-Proximity Decision-Making

Reinforcement learning (RL) models of decision-making cannot account for human decisions in the absence of prior reward or punishment. We propose a mechanism for choosing among available options based on goal-option association strengths, where association strengths between objects represent previously experienced object proximity. The proposed mechanism, Goal-Proximity Decision-making (GPD), is implemented within the ACT-R cognitive framework. GPD is found to be more efficient than RL in three maze-navigation simulations. GPD advantages over RL seem to grow as task difficulty is increased. An experiment is presented where participants are asked to make choices in the absence of prior reward. GPD captures human performance in this experiment better than RL.

[1]  John E. Laird,et al.  Soar-RL: integrating reinforcement learning with Soar , 2005, Cognitive Systems Research.

[2]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[3]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[4]  B. Balleine,et al.  Actions and responses: The dual psychology of behaviour. , 1993 .

[5]  Clay B. Holroyd,et al.  The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. , 2002, Psychological review.

[6]  R. Rescorla A theory of pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement , 1972 .

[7]  C. Lebiere,et al.  The Atomic Components of Thought , 1998 .

[8]  David R. Shanks,et al.  CHAPTER 12 – Human Associative Learning , 1994 .

[9]  Christian Lebiere,et al.  Sequence Learning in the ACT-R Cognitive Architecture: Empirical Analysis of a Hybrid Model , 2001, Sequence Learning.

[10]  H. Blodgett,et al.  The effect of the introduction of reward upon the maze performance of rats , 1929 .

[11]  N. Schmajuk,et al.  Latent learning, shortcuts and detours: a computational model , 2002, Behavioural Processes.

[12]  John R. Anderson How Can the Human Mind Occur in the Physical Universe , 2007 .

[13]  Ron Sun,et al.  Autonomous learning of sequential tasks: experiments and analyses , 1998, IEEE Trans. Neural Networks.

[14]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.

[15]  H W STEVENSON Latent learning in children. , 1954, Journal of experimental psychology.

[16]  Wai-Tat Fu,et al.  SNIF-ACT: A Cognitive Model of User Navigation on the World Wide Web , 2007, Hum. Comput. Interact..

[17]  John R. Anderson,et al.  Cognitive Tutors: Lessons Learned , 1995 .

[18]  John R. Anderson,et al.  Dual learning processes in interactive skill acquisition. , 2008, Journal of experimental psychology. Applied.

[19]  D. Quartermain,et al.  Incidental learning in a simple task. , 1960 .

[20]  B. Balleine,et al.  Motivational control of goal-directed action , 1994 .

[21]  C. H. Honzik,et al.  Degrees of hunger, reward and non-reward, and maze learning in rats, and Introduction and removal of reward, and maze performance in rats , 1930 .

[22]  Christian Lebiere,et al.  Implicit and explicit learning in ACT-R , 1998 .

[23]  E. Tolman Cognitive maps in rats and men. , 1948, Psychological review.

[24]  Jean-Arcady Meyer,et al.  BIOLOGICALLY BASED ARTIFICIAL NAVIGATION SYSTEMS: REVIEW AND PROSPECTS , 1997, Progress in Neurobiology.

[25]  John R. Anderson,et al.  From recurrent choice to skill learning: a reinforcement-learning model. , 2006, Journal of experimental psychology. General.

[26]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.