of the Annual Meeting of the Cognitive Science Society Title Are People Successful at Learning Sequential Decisions on a Perceptual Matching Task ?

Are People Successful at Learning Sequential Decisions on a Perceptual Matching Task? Reiko Yakushijin (yaku@cl.aoyama.ac.jp) Department of Psychology, Aoyama Gakuin University, Shibuya, Tokyo, 150-8366, Japan Robert A. Jacobs (robbie@bcs.rochester.edu) Department of Brain & Cognitive Sciences, University of Rochester, Rochester, NY 14627, USA Abstract learner’s performance can be compared to the optimal perfor- mance for that task. If a learner achieves near-optimal perfor- mance at the end of training, then it can be claimed that the learner has been successful. Sequential decision-making tasks are commonplace in our ev- eryday lives. We report the results of an experiment in which human subjects were trained to perform a perceptual matching task, an instance of a sequential decision-making task. We use two benchmarks to evaluate the quality of subjects’ learning. One benchmark is based on optimal performance as defined by a dynamic programming procedure. The other is based on an adaptive computational agent that uses a reinforcement learning method known as Q-learning to learn to perform the task. Our analyses suggest that subjects learned to perform the perceptual matching task in a near-optimal manner at the end of training. Subjects were able to achieve near-optimal performance because they learned, at least partially, the causal structure underlying the task. Subjects’ learning curves were broadly consistent with those of model-based reinforcement- learning agents that built and used internal models of how their actions influenced the external environment. We hypothesize that, in general, people will achieve near-optimal performances on sequential decision-making tasks when they can detect the effects of their actions on the environment, and when they can represent and reason about these effects using an internal men- tal model. A second way of evaluating a learner is to compare the learner’s performances with those of an adaptive computa- tional agent that is trained to perform the same task. We con- sider here an agent that learns via “reinforcement learning” methods developed by researchers interested in artificial in- telligence (Sutton & Barto, 1998). Cognitive scientists have begun to use reinforcement learning methods to develop new theories of biological learning (e.g., Busemeyer & Pleskac, 2009; Daw & Touretzky, 2002; Schultz, Dayan, & Montague, 1997; Fu & Anderson, 2006). To date, however, there are few comparisons of the learning curves of people and agents based on reinforcement learning methods. Because reinforce- ment learning is regarded as effective and well-understood from an engineering perspective, and as plausible from psy- chological and neurophysiological perspectives, the perfor- mances of agents based on this form of learning can provide useful benchmarks for evaluating a person’s learning. If a person’s performance during training improves at the same rate as that of a reinforcement-learning agent, then it can be argued that the person is a successful learner. If a per- son’s performance improves at a slower rate, then the person is not learning as much from experience as he or she could learn. Experimentation is often required to identify the cogni- tive “bottlenecks” preventing the person from learning faster. Lastly, if a person’s performance improves at a faster rate, then this suggests that the person is using information sources or information processing operations that are not available to the agent. A new, more complex agent should be considered in this case. Keywords: sequential decision making; optimal performance; dynamic programming; reinforcement learning Introduction Tasks requiring people to make a sequence of decisions to reach a goal are commonplace in our lives. When playing chess, a person must choose a sequence of chess moves to capture an opponent’s king. When driving to work, a per- son must choose a sequence of left and right turns to arrive at work in a timely manner. And when pursuing financial goals, a person must choose a sequence of saving and spend- ing options to achieve a financial target. Interest in sequen- tial decision-making tasks among cognitive scientists has in- creased dramatically in recent years (e.g., Busemeyer, 2002; Chhabra & Jacobs, 2006; Fu & Anderson, 2006; Gibson, Fichman, & Plaut, 1997; Gureckis & Love, 2009; Lee, 2006; Sutton & Barto, 1998; Shanks, Tunney, & McCarthy, 2002). Here, we are interested in whether people are successful at learning to perform sequential decision-making tasks. There are at least two ways in which the quality of learning can be evaluated. These ways differ in terms of the benchmark to which the performances of a learner are compared. One way uses a benchmark of optimal performance on a task. Anal- yses based on optimal performance are referred to as ideal observer analyses, ideal actor analyses, or rational analyses in the literatures on perception, motor control, and cognition, respectively. At each moment during training with a task, a We report the results of an experiment in which human sub- jects were trained to perform a perceptual matching task. This task was designed to contain a number of desirable features. Importantly, the perceptual matching task is an instance of a sequential decision-making task. Subjects made a sequence of decisions (or, equivalently, took a sequence of actions) to modify an environmental state to a goal state. In addition, ef- ficient performance on the perceptual matching task required knowledge of how different properties of an environment in- teracted with each other. In many everyday tasks, people are required to understand the interactions, or “causal relations”, among multiple components (Busemeyer, 2002; Gopnik &

[1]  B. Love,et al.  Short-term gains, long-term pains: How cues about state aid learning in dynamic environments , 2009, Cognition.

[2]  R. Bellman Dynamic programming. , 1957, Science.

[3]  Robert A Jacobs,et al.  Near-Optimal Human Adaptive Control across Different Noise Environments , 2006, The Journal of Neuroscience.

[4]  Jerome R. Busemeyer,et al.  Dynamic Decision Making , 2015 .

[5]  J. Gielis A generic geometric transformation that unifies a wide range of natural and abstract shapes. , 2003, American journal of botany.

[6]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[7]  D. Shanks,et al.  A Re-examination of Probability Matching and Rational Choice , 2002 .

[8]  John R. Anderson,et al.  From recurrent choice to skill learning: a reinforcement-learning model. , 2006, Journal of experimental psychology. General.

[9]  D. Plaut,et al.  Learning in Dynamic Decision Tasks: Computational Model and Empirical Evidence , 1997 .

[10]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[11]  C. Atkeson,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[12]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Wayne D. Gray,et al.  Melioration Dominates Maximization: Stable Suboptimal Performance Despite Global Feedback , 2006 .

[15]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[16]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[17]  Gordon E Legge,et al.  Lost in virtual space: studies in human and ideal spatial navigation. , 2006, Journal of experimental psychology. Human perception and performance.

[18]  David S. Touretzky,et al.  Long-Term Reward Prediction in TD Models of the Dopamine System , 2002, Neural Computation.

[19]  Michael D. Lee,et al.  A Hierarchical Bayesian Model of Human Decision-Making on an Optimal Stopping Problem , 2006, Cogn. Sci..

[20]  Timothy J. Pleskac,et al.  Theoretical tools for understanding and aiding dynamic decision making , 2009 .

[21]  A. Gopnik,et al.  Causal learning : psychology, philosophy, and computation , 2007 .