Solving the Credit Assignment Problem: The Interaction of Explicit and Implicit Learning with Internal and External State Information

Solving the Credit Assignment Problem: The interaction of Explicit and Implicit learning with Internal and External State Information Wai-Tat Fu (wfu@cmu.edu) Human Factors Division and Beckman Institute University of Illinois at Urbana-Champaign 1 Airport Road, Savoy, IL 61874, USA John R. Anderson (ja+@cmu.edu) Department of Psychology Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 USA recognition of external state information (signs on the walls). Indeed, many have argued that real-world skills often involve the interplay between cognition (internal), perception, and action (external) that the understanding of these interactive skills requires careful study of how internal (memory) and external information (cues in the environment) are processed in the learning processes (Ballard, 1997; Fu & Gray, 2000; 2004; Gray & Fu, 2004; Larkin, 1989; Gray, Sims, Fu, & Schoelles, in press). The navigation problem above is an example of one of the most difficult situations in skill learning: when the learner has to perform a sequence of actions but only gets feedback on their success at the end of the sequence (e.g., when the destination is reached). This creates a credit-assignment problem, in which the learner has to assign credits to earlier actions that are responsible for eventual success. When actions are interdependent, either memory of previous actions or recognition of the correct problem state in the external environment is required to properly assign credits to the appropriate actions. In this article, we present results from an experiment in which we study how people learn to solve the credit-assignment problem in a simple but challenging example of such a situation. Our focus is on the recent proposal that humans exhibit two distinct learning processes and we apply it to learning of action sequences with delayed feedback: an explicit process (with awareness) that requires memory for actions and outcomes, and an implicit process (without awareness) that does not require such memory. We will first review research in some related areas that informed the design of our experiment. Abstract In most problem-solving activities, feedback is received at the end of an action sequence. This creates a credit-assignment problem where the learner must associate the feedback with earlier actions, and the interdependencies of actions require the learner to either remember past choices of actions (internal state information) or rely on external cues in the environment (external state information) to select the right actions. We investigated the nature of explicit and implicit learning processes in the credit-assignment problem using a probabilistic sequential choice task with and without external state information. We found that when explicit memory encoding was dominant, subjects were faster to select the better option in their first choices than in the last choices; when implicit reinforcement learning was dominant subjects were faster to select the better option in their last choices than in their first choices. However, implicit reinforcement learning was only successful when distinct external state information was available. The results suggest the nature of learning in credit assignment: an explicit memory encoding process that keeps track of internal state information and a reinforcement-learning process that uses state information to propagate reinforcement backwards to previous choices. However, the implicit reinforcement learning process is effective only when the valences can be attributed to the appropriate states in the system – either internally generated states in the cognitive system or externally presented stimuli in the environment. Introduction Consider a person navigating in a large office building. The person has to decide when to turn left or right at various hallway intersections. The sequence of decisions is interdependent – e.g., turning left at a particular hallway intersection will affect the decisions at the next intersections. The person may therefore need to keep track of previous actions to inform what actions to take in the future. In reality, memory of previous actions (internal state information) may not be necessary as people can explicitly seek information in the environment (external state information) to know where one is located or which direction to go to reach a destination (Fu & Gray, 2006). Learning to navigate is therefore likely to involve both the retention of internal state information (memory) and the Explicit and Implicit Learning Probability Learning and Classification There have been numerous studies on the learning of the probabilistic relationship between choices and their consequences. The simplest situation is the probability- learning experiment in which subjects guess which of the alternatives occurs and then receives feedback on their guesses (e.g., Estes, 1964). One robust finding is that subjects often “probability match”; that is, they will choose a particular alternative with the same probability that it is reinforced (e.g., Friedman et al., 1964). This leads many to propose that probability matching is the result of an implicit

[1]  Wayne D. Gray,et al.  The soft constraints hypothesis: a rational analysis approach to resource allocation for interactive behavior. , 2006, Psychological review.

[2]  Wayne D. Gray,et al.  Suboptimal tradeoffs in information seeking , 2006, Cognitive Psychology.

[3]  John R. Anderson,et al.  From recurrent choice to skill learning: a reinforcement-learning model. , 2006, Journal of experimental psychology. General.

[4]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[5]  R. Sun,et al.  The interaction of the explicit and the implicit in skill learning: a dual-process approach. , 2005, Psychological review.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Wai-Tat Fu,et al.  Resolving the paradox of the active user: stable suboptimal performance in interactive tasks , 2004, Cogn. Sci..

[8]  Wai-Tat Fu,et al.  Soft constraints in interactive behavior: the case of ignoring perfect knowledge in-the-world for imperfect knowledge in-the-head , 2004, Cogn. Sci..

[9]  R. Mathews,et al.  Role of Implicit and Explicit Processes in Learning From Examples : A Synergistic Effect , 2004 .

[10]  W. Estes,et al.  Traps in the route to models of memory and decision , 2002, Psychonomic bulletin & review.

[11]  M. Gluck,et al.  Interactive memory systems in the human brain , 2001, Nature.

[12]  F. Ashby,et al.  The effects of concurrent task interference on category learning: Evidence for multiple category learning systems , 2001, Psychonomic bulletin & review.

[13]  L. Brooks,et al.  Specializing the operation of an explicit rule , 1991 .

[14]  Nir Vulkan An Economist's Perspective on Probability Matching , 2000 .

[15]  Wai-Tat Fu,et al.  Memory versus Perceptual-Motor Tradeoffs in a Blocks World Task , 2000 .

[16]  Patricia M. Berretty,et al.  On the dominance of unidimensional rules in unsupervised categorization , 1999, Perception & psychophysics.

[17]  Daniel B. Willingham,et al.  A Neuropsychological Theory of Motor Skill Learning , 2004 .

[18]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[19]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[20]  Jennifer A. Mangels,et al.  A Neostriatal Habit Learning System in Humans , 1996, Science.

[21]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[22]  A. Graybiel Building action repertoires: memory and learning functions of the basal ganglia , 1995, Current Opinion in Neurobiology.

[23]  D. Shanks,et al.  Characteristics of dissociable human learning systems , 1994, Behavioral and Brain Sciences.

[24]  M. Gluck,et al.  Probabilistic classification learning in amnesia. , 1994, Learning & memory.

[25]  Tim Curran,et al.  Attentional and Nonattentional Forms of Sequence Learning , 1993 .

[26]  A. Reber Implicit learning and tacit knowledge , 1993 .

[27]  James L. McClelland,et al.  Learning the structure of event sequences. , 1991, Journal of experimental psychology. General.

[28]  Richard I. Ivry,et al.  Attention and structure in sequence learning. , 1990 .

[29]  Kenneth Kotovsky,et al.  Complex Information Processing: The Impact of Herbert A. Simon , 1989 .

[30]  J. H. Larkin,et al.  Display-based problem solving , 1989 .

[31]  Daniel B. Willingham,et al.  On the development of procedural knowledge. , 1989, Journal of experimental psychology. Learning, memory, and cognition.

[32]  M. Nissen,et al.  Attentional requirements of learning: Evidence from performance measures , 1987, Cognitive Psychology.

[33]  J. Yellott Probability learning with noncontingent success , 1969 .

[34]  A. W. Melton Categories of Human Learning , 1964 .

[35]  M E Bitterman,et al.  Probability Learning. , 1962, Science.