论文信息 - The metacognitive loop I: Enhancing reinforcement learning with metacognitive monitoring and control for improved perturbation tolerance

The metacognitive loop I: Enhancing reinforcement learning with metacognitive monitoring and control for improved perturbation tolerance

Maintaining adequate performance in dynamic and uncertain settings has been a perennial stumbling block for intelligent systems. Nevertheless, any system intended for real-world deployment must be able to accommodate unexpected change—that is, it must be perturbation tolerant. We have found that metacognitive monitoring and control—the ability of a system to self-monitor its own decision-making processes and ongoing performance, and to make targeted changes to its beliefs and action-determining components—can play an important role in helping intelligent systems cope with the perturbations that are the inevitable result of real-world deployment. In this article we present the results of several experiments demonstrating the efficacy of metacognition in improving the perturbation tolerance of reinforcement learners, and discuss a general theory of metacognitive monitoring and control, in a form we call the metacognitive loop. ||This research is supported in part by the AFOSR and ONR.

Tim Oates | Donald Perlis | Waiyian Chong | Michael L. Anderson

[1] Donald Perlis,et al. On the consistency of commonsense reasoning , 1986, Comput. Intell..

[2] Madhura Nirkhe. Time-Situated Reasoning within Tight Deadlines and Realistic Space and Computation Bounds , 1994, AAAI.

[3] Seiichi Ozawa,et al. Incremental learning in dynamic environments using neural network with long-term memory , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[5] Lorenzo Peña y Gonzalo. Paraconsistent logic: essays on the inconsistent , 1990 .

[6] H. Kendler,et al. Vertical and horizontal processes in problem solving. , 1962, Psychological review.

[7] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[8] András Lörincz,et al. MDPs: Learning in Varying Environments , 2003, J. Mach. Learn. Res..

[9] Howard H. Kendler,et al. Reversal-shift behavior: Some basic issues. , 1969 .

[10] David R. Traum,et al. Representations of Dialogue State for Domain and Task Independent Meta-Dialogue , 1999, Electron. Trans. Artif. Intell..

[11] Donald Perlis,et al. Systems that detect and repair their own mistakes , 2001 .

[12] Sarit Kraus,et al. How to (Plan to) Meet a Deadline between Now and Then , 1997, J. Log. Comput..

[13] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[14] T. O. Nelson. Consciousness and metacognition. , 1996 .

[15] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[16] Donald Perlis,et al. Logic, Self-awareness and Self-improvement: the Metacognitive Loop and the Problem of Brittleness , 2005, J. Log. Comput..

[17] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18] Chung Hee Hwang,et al. The TRAINS project: a case study in building a conversational planning agent , 1994, J. Exp. Theor. Artif. Intell..

[19] Graham Priest,et al. Paraconsistent Logic: Essays on the Inconsistent , 1990 .

[20] John Dunlosky,et al. Utilization of Metacognitive Judgments in the Allocation of Study During Multitrial Learning , 1994 .

[21] Donald Perlis,et al. Active Logics: A Unified Formal Approach to Episodic Reasoning , 1999 .

[22] Donald Perlis,et al. Presentations and this and that: logic in action , 1998 .

[23] Donald Perlis,et al. Conversational adequacy: mistakes are the essence , 1998, Int. J. Hum. Comput. Stud..

[24] Eyal Amir,et al. Toward a Formalization of Elaboration Tolerance: Adding and Deleting Axioms , 2001 .

[25] Marco Wiering,et al. Reinforcement Learning in Dynamic Environments using Instantiated Information , 2001, ICML.

[26] Anthony Hunter,et al. Paraconsistent logics , 1998 .

[27] Donald Perlis,et al. RGL Study in a Hybrid Real-time System , 2003, Neural Networks and Computational Intelligence.

[28] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[29] P. N. Johnson-Laird,et al. Talking to computers , 1976, Nature.

[30] Donald Perlis,et al. Reasoning situated in time I: basic concepts , 1990, J. Exp. Theor. Artif. Intell..

[31] Donald Perlis,et al. Step-logic: reasoning situated in time , 1988 .

[32] J. McCarthy. ELABORATION TOLERANCE , 1997 .

[33] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[34] J. Dunlosky,et al. Norms of paired-associate recall during multitrial learning of Swahili-English translation equivalents. , 1994, Memory.

[35] Donald Perlis,et al. Towards domain-independent, task-oriented, conversational adequacy , 2003, IJCAI.