Running head: DUAL LEARNING PROCESSES IN INTERACTIVE SKILL ACQUISITION Dual Learning Processes in Interactive Skill Acquisition Wai-Tat Fu University of Illinois at Urbana-Champaign

Acquisition of interactive skills involves the use of internal and external cues. Experiment 1 showed that when actions were interdependent, learning was effective with and without external cues in the single-task condition but was effective only with the presence of external cues in the dual-task condition. In the dual-task condition, actions closer to the feedback were learned faster than actions farther away but this difference was reversed in the single-task condition. Experiment 2 tested how knowledge acquired in single and dual-task conditions would transfer to a new reward structure. Results confirmed the two forms of learning mediated by the secondary task: A declarative memory encoding process that simultaneously assigned credits to actions and a reinforcement-learning process that slowly propagated credits backwards from the feedback. The results showed that both forms of learning were engaged during training, but only at the response selection stage, one form of knowledge may dominate over the other depending on the availability of attentional resources. Dual Learning Processes in Interactive Skill Acquisition The acquisition of interactive skills often involves the retention and utilization of internal cues (e.g., episodic memory of previous actions) and the recognition of relevant external cues (e.g., signs on the walls). The utilization of both internal and external cues to inform actions can be commonly found in most human-technology interactions (e.g., Card, Moran, & Newell, 1983; Fu, & Pirolli, 2007), in which the operators have to encode and recall previous actions and their outcomes and to interpret ongoing information displayed on an interface to decide what to do next. A better understanding of the interactive processes involved in the use of these internal and external cues is critical in predicting learning and performance in different designs of humantechnology interfaces. Understanding interactive skills requires the careful study of how internal and external cues are processed during the interplay of cognition and perception (Ballard, Hayhoe, Pook, & Rao, 1997; Fu & Anderson, in press; Fu & Gray, 2000, 2004, 2006; Gray & Fu, 2004; Gray, Sims, Fu, & Schoelles, 2006; Larkin, 1989). Our focus is on one of the most typical situations in which immediate feedback on individual actions is not available, so that the learner has to perform a sequence of actions and gets feedback on their success at the end of the sequence. This kind of situation creates a difficult credit-assignment problem, in which the learner has to assign credits to earlier actions or external cues that are responsible for eventual success. The creditassignment problem is even more difficult in dynamic environments in which the action outcomes are probabilistic and interdependent, and the environment may change both autonomously and as a result of the actions. In this case, either memory of previous actions or recognition of the correct cues in the external environment is required to properly assign credits to the appropriate actions. Based on the recent findings from psychological research, we hypothesize that humans exhibit two distinct learning processes to solve this credit assignment problem: a declarative process that encodes internal memory cues for actions and outcomes, and a reinforcementlearning process that encodes the relationship between the external cues and action outcomes (e.g., Daw, Niv, & Dayan, 2005; Grafton, Hazeltine, & Ivry, 1995; Keele, Ivry, Mayr, Hazeltine, & Heuer, 2003; Knowlton, Squire, & Gluck, 1994; Knowlton, Mangels, & Squire, 1996; Packard, & Knowlton, 2002; Poldrack, Clark, Pare-Blagoev, Shohamy, Moyano, Myers, & Gluck, 2001; Squire, 1992; Waldron, & Ashby, 2001). One major source of evidence for this distinction is from studies that show that amnesic patients can perform some tasks (e.g., artificial grammar, sequence, or category learning) despite their lack of declarative access to the events of training (e.g., Foerde, Knowlton, & Poldrack, 2006; Knowlton et al., 1994; Nissen, & Bullemer, 1987), whereas patients with basal ganglia disorders show major impairment to similar tasks (Poldrack et al., 2001). Furthermore, recent findings from neuroscience show that neural activities in the basal ganglia correlate well with the predictions of reinforcement learning when learning occurs in various probabilistic reward structures (e.g., Schultz, Dayan, & Montague, 1997). These two sets of results suggest that there exist two distinct learning systems for probabilistic events. In this article, we describe results from two experiments designed to tease apart the nature of these two learning systems in the acquisition of interactive skills. We define interactive skills broadly in terms of learning of action sequences in situations that depend critically on the utilization of external cues. Our experiments are designed by bridging together two lines of psychological research: First, learning the sequential nature of actions is related to the research on sequence learning; second, learning the probabilistic relationship among external cues, previous actions, and their outcomes is related to the research on probability learning and probabilistic classification. Sequence Learning The dual learning processes have often been investigated through a paradigm called sequence learning (e.g., Cohen, Ivry, & Keele, 1990; Curran, & Keele, 1993; Nissen, & Bullemer, 1987; Willingham, Nissen, & Bullemer, 1989). One typical paradigm is the serial reaction time (SRT) task in which participants have to press a sequence of keys as indicated by a sequence of lights. A certain pattern of button presses recurs regularly and participants give evidence of learning this sequence by pressing the keys for this sequence faster than a random sequence. One common finding is that learning is observed as a facilitation of test performance without concomitant awareness of what is being learned. A number of studies have used a secondary task such as counting of tones to study the effects of diminished attention for sequence learning (e.g., Cohen et al., 1990; Curran, & Keele, 1993; Nissen, & Bullemer, 1987). Cohen et al. (1990) found that when attention is diminished by a secondary task, participants could only learn simple pairwise transitions, but failed to learn higher order hierarchical structures in the sequence. The results show that although sequence learning can occur with diminished attention, its scope is limited to simple cases in which simple associations of stimuli are sufficient to perform the task. Probability Learning and Probabilistic Classification In a typical probability-learning experiment, participants guess which of the alternatives occurs and then receives feedback on their guesses (e.g., Estes, 1964). One robust finding is that participants often “probability match”; that is, they will choose a particular alternative with the same probability that it is reinforced (e.g., Friedman, Burke, Cole, Keller. Millward, & Estes, 1964). This leads many to propose that probability matching is the result of a habit-learning mechanism that accumulates information about the probabilistic structure of the environment (e.g., Graybiel, 1995; Knowlton et al., 1994). One important characteristic of this kind of habit learning is that information is acquired gradually across many trials, and appears to be independent of declarative memory as amnesic patients could learn to perform in a probabilistic classification task (Knowlton et al., 1994; but see Gallistel, 2005). However, for non-amnesic human participants, it is difficult to determine whether this kind of probabilistic classification is independent of the use of declarative memory. Given that declarative memory is dominant in humans, it has been argued that learners often initially engage in declarative memory encoding in which they seek to remember sequential patterns even when there are none (Yellott, 1969). Researchers argue that true probabilistic trial-by-trial behavior only appears after hundreds of trials – perhaps by then participants give up the idea of explicitly encoding patterns and the habit-learning process becomes dominant (Estes, 2002; Vulkan, 2000). Similarly, recent research on complex category learning has also provided interesting results suggesting the dual learning systems (Allen, & Brooks, 1991; Ashby, Queller, & Berretty, 1999; Waldron, & Ashby, 2001). Present Approach In both the sequence-learning and the probability-learning paradigms, participants do not need to learn from the delayed feedback of a single action as immediate feedback is given. In a typical SRT task there is a sequence of actions but there is a deterministic relationship (given by instructions) between the stimuli and their responses. Participants in the SRT may anticipate the next stimuli but they always get immediate feedback after their responses. In probability learning the stimulus-response relationship is probabilistic but there is a single action after which feedback is received. Neither of these paradigms then directly reflects the complexity of the credit-assignment problem in interactive tasks in which people often have to learn to sequentially choose actions with probabilistic outcomes and receive feedback only after the whole action sequence is executed. Our studies were designed by combining research from both areas by studying the nature of the learning processes that assign credits to different actions and external cues in a probabilistic sequential choice task. In this task, a sequence of actions was executed before feedback on its correctness was received, and a particular action sequence was correct only with a certain probability. Our goal is to use the novel paradigm to investigate the nature of the dual learning processes in the general context of interactive skill learning when the learner has to choose the right action sequences by utilizing either memory or

[1]  John R. Anderson,et al.  Solving the credit assignment problem: explicit and implicit learning of action sequences with probabilistic outcomes , 2008, Psychological research.

[2]  Wayne D. Gray,et al.  The soft constraints hypothesis: a rational analysis approach to resource allocation for interactive behavior. , 2006, Psychological review.

[3]  Wayne D. Gray,et al.  Suboptimal tradeoffs in information seeking , 2006, Cognitive Psychology.

[4]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[5]  Charles R. Gallistel,et al.  Deconstructing the law of effect , 2005, Games Econ. Behav..

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Wai-Tat Fu,et al.  Resolving the paradox of the active user: stable suboptimal performance in interactive tasks , 2004, Cogn. Sci..

[8]  Wai-Tat Fu,et al.  Soft constraints in interactive behavior: the case of ignoring perfect knowledge in-the-world for imperfect knowledge in-the-head , 2004, Cogn. Sci..

[9]  Richard I. Ivry,et al.  Attention and Structure in Sequence Learning , 2004 .

[10]  S. Keele,et al.  The cognitive and neural architecture of sequence representation. , 2003, Psychological review.

[11]  J. Gabrieli,et al.  Direct comparison of neural systems mediating conscious and unconscious skill learning. , 2002, Journal of neurophysiology.

[12]  Corey J Bohil,et al.  Observational versus feedback training in rule-based and information-integration category learning , 2002, Memory & cognition.

[13]  W. Estes,et al.  Traps in the route to models of memory and decision , 2002, Psychonomic bulletin & review.

[14]  B. Knowlton,et al.  Learning and memory functions of the Basal Ganglia. , 2002, Annual review of neuroscience.

[15]  M. Gluck,et al.  Interactive memory systems in the human brain , 2001, Nature.

[16]  F. Ashby,et al.  The effects of concurrent task interference on category learning: Evidence for multiple category learning systems , 2001, Psychonomic bulletin & review.

[17]  L. Brooks,et al.  Specializing the operation of an explicit rule , 1991 .

[18]  Nir Vulkan An Economist's Perspective on Probability Matching , 2000 .

[19]  Wai-Tat Fu,et al.  Memory versus Perceptual-Motor Tradeoffs in a Blocks World Task , 2000 .

[20]  Patricia M. Berretty,et al.  On the dominance of unidimensional rules in unsupervised categorization , 1999, Perception & psychophysics.

[21]  Gregory Ashby,et al.  A neuropsychological theory of multiple systems in category learning. , 1998, Psychological review.

[22]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[23]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[24]  Jennifer A. Mangels,et al.  A Neostriatal Habit Learning System in Humans , 1996, Science.

[25]  A. Graybiel Building action repertoires: memory and learning functions of the basal ganglia , 1995, Current Opinion in Neurobiology.

[26]  Scott T. Grafton,et al.  Functional Mapping of Sequence Learning in Normal Humans , 1995, Journal of Cognitive Neuroscience.

[27]  M. Gluck,et al.  Probabilistic classification learning in amnesia. , 1994, Learning & memory.

[28]  Tim Curran,et al.  Attentional and Nonattentional Forms of Sequence Learning , 1993 .

[29]  Stellan Ohlsson,et al.  The Interaction Between Knowledge and Practice in the Acquisition of Cognitive Skills , 1993 .

[30]  L. Squire Memory and the hippocampus: a synthesis from findings with rats, monkeys, and humans. , 1992, Psychological review.

[31]  R. Herrnstein,et al.  Melioration: A Theory of Distributed Choice , 1991 .

[32]  J. H. Larkin,et al.  Display-based problem solving , 1989 .

[33]  Daniel B. Willingham,et al.  On the development of procedural knowledge. , 1989, Journal of experimental psychology. Learning, memory, and cognition.

[34]  John Sweller,et al.  Cognitive Load During Problem Solving: Effects on Learning , 1988, Cogn. Sci..

[35]  Colin Potts,et al.  Design of Everyday Things , 1988 .

[36]  Gregory Ashby,et al.  Decision rules in the perception and categorization of multidimensional stimuli. , 1988, Journal of experimental psychology. Learning, memory, and cognition.

[37]  G. Logan Toward an instance theory of automatization. , 1988 .

[38]  M. Nissen,et al.  Attentional requirements of learning: Evidence from performance measures , 1987, Cognitive Psychology.

[39]  Herbert A. Simon,et al.  Why a Diagram is (Sometimes) Worth Ten Thousand Words , 1987, Cogn. Sci..

[40]  Stephen W. Draper,et al.  Display Managers as the Basis for User-Machine Communication , 1986 .

[41]  Allen Newell,et al.  The psychology of human-computer interaction , 1983 .

[42]  Allen and Rosenbloom Paul S. Newell,et al.  Mechanisms of Skill Acquisition and the Law of Practice , 1993 .

[43]  John R. Anderson Acquisition of cognitive skill. , 1982 .

[44]  W Vaughan,et al.  Melioration, matching, and maximization. , 1981, Journal of the experimental analysis of behavior.

[45]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[46]  J. Yellott Probability learning with noncontingent success , 1969 .

[47]  M E Bitterman,et al.  Probability Learning. , 1962, Science.