The metacognitive loop I: Enhancing reinforcement learning with metacognitive monitoring and control for improved perturbation tolerance

Maintaining adequate performance in dynamic and uncertain settings has been a perennial stumbling block for intelligent systems. Nevertheless, any system intended for real-world deployment must be able to accommodate unexpected change—that is, it must be perturbation tolerant. We have found that metacognitive monitoring and control—the ability of a system to self-monitor its own decision-making processes and ongoing performance, and to make targeted changes to its beliefs and action-determining components—can play an important role in helping intelligent systems cope with the perturbations that are the inevitable result of real-world deployment. In this article we present the results of several experiments demonstrating the efficacy of metacognition in improving the perturbation tolerance of reinforcement learners, and discuss a general theory of metacognitive monitoring and control, in a form we call the metacognitive loop. ||This research is supported in part by the AFOSR and ONR.

[1]  Donald Perlis,et al.  On the consistency of commonsense reasoning , 1986, Comput. Intell..

[2]  Madhura Nirkhe Time-Situated Reasoning within Tight Deadlines and Realistic Space and Computation Bounds , 1994, AAAI.

[3]  Seiichi Ozawa,et al.  Incremental learning in dynamic environments using neural network with long-term memory , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[4]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[5]  Lorenzo Peña y Gonzalo Paraconsistent logic: essays on the inconsistent , 1990 .

[6]  H. Kendler,et al.  Vertical and horizontal processes in problem solving. , 1962, Psychological review.

[7]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[8]  András Lörincz,et al.  MDPs: Learning in Varying Environments , 2003, J. Mach. Learn. Res..

[9]  Howard H. Kendler,et al.  Reversal-shift behavior: Some basic issues. , 1969 .

[10]  David R. Traum,et al.  Representations of Dialogue State for Domain and Task Independent Meta-Dialogue , 1999, Electron. Trans. Artif. Intell..

[11]  Donald Perlis,et al.  Systems that detect and repair their own mistakes , 2001 .

[12]  Sarit Kraus,et al.  How to (Plan to) Meet a Deadline between Now and Then , 1997, J. Log. Comput..

[13]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[14]  T. O. Nelson Consciousness and metacognition. , 1996 .

[15]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[16]  Donald Perlis,et al.  Logic, Self-awareness and Self-improvement: the Metacognitive Loop and the Problem of Brittleness , 2005, J. Log. Comput..

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Chung Hee Hwang,et al.  The TRAINS project: a case study in building a conversational planning agent , 1994, J. Exp. Theor. Artif. Intell..

[19]  Graham Priest,et al.  Paraconsistent Logic: Essays on the Inconsistent , 1990 .

[20]  John Dunlosky,et al.  Utilization of Metacognitive Judgments in the Allocation of Study During Multitrial Learning , 1994 .

[21]  Donald Perlis,et al.  Active Logics: A Unified Formal Approach to Episodic Reasoning , 1999 .

[22]  Donald Perlis,et al.  Presentations and this and that: logic in action , 1998 .

[23]  Donald Perlis,et al.  Conversational adequacy: mistakes are the essence , 1998, Int. J. Hum. Comput. Stud..

[24]  Eyal Amir,et al.  Toward a Formalization of Elaboration Tolerance: Adding and Deleting Axioms , 2001 .

[25]  Marco Wiering,et al.  Reinforcement Learning in Dynamic Environments using Instantiated Information , 2001, ICML.

[26]  Anthony Hunter,et al.  Paraconsistent logics , 1998 .

[27]  Donald Perlis,et al.  RGL Study in a Hybrid Real-time System , 2003, Neural Networks and Computational Intelligence.

[28]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[29]  P. N. Johnson-Laird,et al.  Talking to computers , 1976, Nature.

[30]  Donald Perlis,et al.  Reasoning situated in time I: basic concepts , 1990, J. Exp. Theor. Artif. Intell..

[31]  Donald Perlis,et al.  Step-logic: reasoning situated in time , 1988 .

[32]  J. McCarthy ELABORATION TOLERANCE , 1997 .

[33]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[34]  J. Dunlosky,et al.  Norms of paired-associate recall during multitrial learning of Swahili-English translation equivalents. , 1994, Memory.

[35]  Donald Perlis,et al.  Towards domain-independent, task-oriented, conversational adequacy , 2003, IJCAI.