论文信息 - Learning from errors by counterfactual reasoning in a unified cognitive architecture

Learning from errors by counterfactual reasoning in a unified cognitive architecture

Learning from Errors by Counterfactual Reasoning in a Unified Cognitive Architecture Andreea Danielescu (lavinia.danielescu@asu.edu) David J. Stracuzzi (david.stracuzzi@gmail.com) Nan Li (nanan9177@gmail.com) Pat Langley (langley@asu.edu) Computing Science and Engineering Arizona State University, Tempe, AZ 85287 USA Abstract A key characteristic of human cognition is the ability to learn from undesirable outcomes. This paper presents a computational account of learning from errors based on counterfactual reasoning, which we embed in Icarus, a unified theory of the cognitive architecture. Our ap- proach acquires new skills from single experiences that improve upon and mask those that initially produced the undesirable behavior. We demonstrate the opera- tion of this model in a simulated urban driving environ- ment. We also relate our approach to other research on error-driven learning and discuss possible improvements to the framework. Keywords: cognitive architecture, learning from error, counterfactual reasoning, problem solving Background and Motivation The ability to acquire knowledge from experience is a fundamental component of human intelligence. There exist many accounts of learning from positive experi- ences, most often based on successful problem-solving attempts (Anzai & Simon, 1979; Laird, Rosenbloom, & Newell, 1986). In this paper, we focus instead on learn- ing from undesirable outcomes, an ability that plays an important role in human cognition by providing a mech- anism for avoiding past failures in the future. We pro- vide a computational model for one type of error-driven learning that uses counterfactual reasoning to determine both the error’s cause and the correct behavior. Counterfactual reasoning is a strategy that considers what might have occurred if causal events were changed in some way. Psychological studies suggest that people employ counterfactual reasoning in a variety of situations (Roese, Hur, & Pennington, 1999). Byrne and McEleney (2000) also show that they tend to employ counterfactual reasoning mainly in response to negative outcomes, such as failure to achieve or maintain goals. Finally, Epstude and Roese (2008) make the connection to learning, based on their theory that the primary motivation for counter- factual reasoning is to improve future performance. The work described here offers a computational ac- count of the role of counterfactual reasoning in learning from failures. We embed this account within Icarus (Langley & Choi, 2006), a unified theory of the human cognitive architecture that makes a commitment to hi- erarchical, composable knowledge structures. We claim that these structures, along with the mechanisms for using and acquiring them, provide Icarus with basic support for benefiting from undesirable outcomes. Our approach to learning from errors responds to a single negative experience, which distinguishes it from connec- tionist, reinforcement-based, and Bayesian techniques, which typically require many experiences. We begin our discussion with a motivating task do- main and a review of the Icarus architecture. After this, we present our approach to learning from errors via counterfactual reasoning, including methods for deter- mining the source of the error, acquiring new concepts and skills in response, and utilizing these structures in future behavior. We then describe the extended architec- ture’s operation in the task domain, discuss connections to other work on error-driven learning, and consider di- rections for further research in this important area. An Illustrative Domain: Urban Driving In modern society, the task of operating a vehicle in an urban setting is both common and cognitively challeng- ing. People perform a variety of tasks in this context, such as navigation, obstacle avoidance, and signal re- sponse, along with higher-level tasks such as package delivery. Successful performance relies on substantial do- main expertise, making urban driving a useful domain in which to study embedded cognition and learning. For this reason, we have developed a three-dimensional urban driving environment based on the Torque Game Engine produced by Garage Games. 1 The driving simu- lator provides the driver with control over the gas pedal, brake and steering, with objects obeying realistic laws of physics. The simulator also generates detailed percep- tual information, in egocentric polar coordinates, about nearby entities, including road segments, intersections, lane lines, buildings, pedestrians, and other vehicles. The driving task we examine here requires the agent to overtake a stalled vehicle, which it decides to passs on the left. However, in taking this step, the agent crosses a double yellow line, thereby violating the rules of driving and risking collision with oncoming traffic. The problem is not that the agent lacks knowledge about this con- straint; the error occurs because it was focusing on a different goal that interacts with the one it violates. We http://www.garagegames.com

[1] David W. Aha,et al. Constructing Game Agents from Video of Human Behavior , 2009, AIIDE.

[2] N. Roese,et al. Counterfactual thinking and regulatory focus: implications for action versus inaction and sufficiency versus necessity. , 1999, Journal of personality and social psychology.

[3] David J. Stracuzzi,et al. Representing and Reasoning over Time in a Symbolic Cognitive Architecture , 2009 .

[4] John R. Anderson,et al. Rules of the Mind , 1993 .

[5] N. Roese,et al. The Functional Theory of Counterfactual Thinking , 2008, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[6] Stellan Ohlsson,et al. Learning from Performance Errors. , 1996 .

[7] Allen Newell,et al. Chunking in Soar: The anatomy of a general learning mechanism , 1985, Machine Learning.

[8] Michael G. Dyer,et al. Towards a computational theory of human daydreaming , 1998, ArXiv.

[9] Douglas J. Pearson. Learning Procedural Planning Knowledge in Complex Environments , 1996, AAAI/IAAI, Vol. 2.

[10] Allen Newell,et al. GPS, a program that simulates human thought , 1995 .

[11] Pat Langley,et al. A general theory of discrimination learning , 1987 .

[12] G. Wells,et al. Mental Simulation of Causality , 1989 .

[13] N. Roese. Counterfactual thinking. , 1997, Psychological bulletin.

[14] A. Newell. Unified Theories of Cognition , 1990 .

[15] Roger C. Schank,et al. Explanation Patterns: Understanding Mechanically and Creatively , 1986 .

[16] Pat Langley,et al. A Unified Cognitive Architecture for Physical Agents , 2006, AAAI.

[17] Donald Nute,et al. Counterfactuals , 1975, Notre Dame J. Formal Log..

[18] Oren Etzioni,et al. PRODIGY: an integrated architecture for planning and learning , 1991, SGAR.

[19] H A Simon,et al. The theory of learning by doing. , 1979, Psychological review.

[20] R. Byrne,et al. Counterfactual thinking about actions and failures to act. , 2000, Journal of experimental psychology. Learning, memory, and cognition.