论文信息 - Cognitive Flexibility through Learning from Constraint Violations

Cognitive Flexibility through Learning from Constraint Violations

Cognitive flexibility is an important goal in the computational modeling of higher cognition. An agent operating in the world that changes over time should adapt to the changes and update its knowledge according to them. In this paper, we report on the implementation of a constraint-based mechanism for learning from negative outcomes in well-established cognitive architecture, ICARUS. We discuss the challenges encountered during the implementation, describe how we solved them and provide an example of the integrated system’s operation. 1. Background and Rationale An important goal in the computational modeling of higher cognition is to invent techniques that enable computer programs to mimic the broad human functionality that we call adaptability, flexibility, or intelligence. Cognitive flexibility is a multidimensional construct. In this paper, we focus specifically on the ability of humans to act effectively and purposefully even when a familiar task environment is changing, thus rendering previously learned skills and strategies less effective or even obsolete. When the environment changes, the execution of previously acquired skills is likely to generate actions are that are inappropriate, incorrect or unhelpful vis-àvis the agent’s goal. A key component of flexible adaptation to the changing circumstances is therefore the ability to recover from and unlearn unsuccessful actions in the service of more effective future behavior (Ohlsson, 2010). This problem differs from the standard view of skill acquisition in two principled ways. First, instead of learning a new skill from scratch, the learning agent in this scenario needs to revise an existing skill or strategy. Second, whereas most work in computational modeling of skill acquisition has focused on how to make use of positive outcomes, the adaptation scenario requires mechanisms for learning from errors, mistakes and other types of negative feedback (Ohlsson, 2008). In past work, we developed a mechanism for learning from negative outcomes that is called constraint-based specialization (Ohlsson, 1993, 1996, 2007). This mechanism assumes that the agent has access to declarative knowledge in the form of constraints, where a constraint consists of an ordered pair with a relevance criterion and a satisfaction criterion, . Unlike propositions, constraints do not encode truths, but norms and prescriptions, e.g., traffic laws. A speed limit does not describe how fast drivers are going, but specifies the range within which their speeds ought to fall. Constraints support evaluation and judgment rather than deduction or explanation. In a constraintbased system, the architecture matches the relevance criteria of all constraints against the current state of its world in each cycle of operation. For constraints with matching relevance conditions, the satisfaction conditions are matched also. Satisfied constraints require no response, but constraint violations signal a failed expectation (due to a change in the world or to incomplete or erroneous knowledge); this is a learning opportunity. The purpose of the change triggered by a constraint violation is to revise the current skill or strategy in such a way as to avoid violating the same constraint in the future. The computational problem involved in unlearning an error is to specify exactly how to revise the relevant skill when an error is detected. The constraint-based specialization algorithm is a general solution to this problem (Ohlsson & Rees, 1991). The constraint-based specialization mechanism was previously implemented in HS, a production system architecture (Ohlsson, 1996). The HS system was limited along several dimensions. First, HS did not explicitly represent or take into account the hierarchical Proceedings of the 19th Conference on Behavior Representation in Modeling and Simulation, Charleston, SC, 21 24 March 2010

Stellan Ohlsson | Dongkyu Choi

[1] Stellan Ohlsson,et al. Learning from Performance Errors. , 1996 .

[2] Stellan Ohlsson,et al. Deep Learning - How the Mind Overrides Experience , 2011 .

[3] Allen Newell,et al. Chunking in Soar: The anatomy of a general learning mechanism , 1985, Machine Learning.

[4] Stellan Ohlsson,et al. Computational Models of Skill Acquisition , 2008 .

[5] Pat Langley,et al. A general theory of discrimination learning , 1987 .

[6] Pat Langley,et al. Learning Recursive Control Programs from Problem Solving , 2006, J. Mach. Learn. Res..

[7] Pat Langley,et al. A Unified Cognitive Architecture for Physical Agents , 2006, AAAI.

[8] Stellan Ohlsson,et al. The Interaction Between Knowledge and Practice in the Acquisition of Cognitive Skills , 1993 .

[9] Stellan Ohlsson,et al. Adaptive search through constraint violations , 1991, J. Exp. Theor. Artif. Intell..

[10] John R. Anderson. How Can the Human Mind Occur in the Physical Universe , 2007 .

[11] Stellan Ohlsson. The Effects of Order: A Constraint-Based Explanation , 2007 .

[12] Pat Langley,et al. Learning to search : from weak methods to domain-specific heuristics , 1985 .

[13] Pat Langley,et al. Learning Search Strategies through Discrimination , 1983, Int. J. Man Mach. Stud..